Byron Hsieh

Posted on Dec 21, 2025

Understanding Kafka Producer: From Basics to Sticky Partitioner

#kafka #java #tutorial #backend

Introduction

When I started learning Apache Kafka, one of the first questions I had was: "How exactly does a Producer work?" The documentation was thorough, but I wanted to understand it through practical examples.

In this article, I'll walk you through what I learned about Kafka Producers, from sending your first message to understanding the mysterious "Sticky Partitioner." We'll build two simple Java programs and explore some interesting behaviors along the way.

This guide is based on the excellent course "Apache Kafka Series - Learn Apache Kafka for Beginners v3".

Part 1: My First Kafka Producer

Let's start with the simplest possible Kafka Producer that sends a single message.

Setting Up Producer Properties

Every Kafka Producer needs configuration. We use Java's Properties class for this:

Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "127.0.0.1:9092");
properties.setProperty("key.serializer", StringSerializer.class.getName());
properties.setProperty("value.serializer", StringSerializer.class.getName());

What's happening here?

bootstrap.servers: The address of your Kafka broker(s)
key.serializer: Converts your key object into bytes for transmission
value.serializer: Converts your value object into bytes

Note: java.util.Properties is similar to Python's dict, but it only stores string key-value pairs. It's part of the Java standard library, not specific to Kafka.

Creating the Producer

With properties configured, we can create our producer:

KafkaProducer<String, String> producer = new KafkaProducer<>(properties);

The generic types <String, String> represent the types for key and value respectively.

Creating and Sending a Message

ProducerRecord<String, String> producerRecord =
    new ProducerRecord<>("demo_java", "hello world");

producer.send(producerRecord);

Here, "demo_java" is the topic name, and "hello world" is our message. Notice we didn't specify a key - it defaults to null.

The Critical Trio: send(), flush(), close()

This is where things get interesting. Here's what each method does:

send() - Asynchronous Operation

producer.send(producerRecord);

Asynchronous: The message goes into a buffer, it's NOT sent immediately
If your program exits right after this, the message might never reach Kafka

flush() - Synchronous Operation

producer.flush();

Synchronous: Forces all buffered messages to be sent and blocks until complete
Useful for learning/demos to ensure messages are sent
Rarely used in production because it impacts performance

close() - Cleanup

producer.close();

Shuts down the Producer and releases resources
Internally calls flush() to ensure all messages are sent
MUST be called in production to prevent resource leaks

Here's the relationship:

send()          flush()         close()
  ↓               ↓               ↓
[Buffer] -----> [Send] -----> [Clean up]
(async)        (sync)         (includes flush)

Best Practice:

Always call close() in production
Avoid calling flush() unless absolutely necessary
Let Kafka handle batching automatically for better performance

Part 2: Producer with Callbacks

Now let's level up and add callbacks to track message metadata.

Why Use Callbacks?

Callbacks let you know when a message is successfully sent (or if it failed):

producer.send(producerRecord, new Callback() {
    @Override
    public void onCompletion(RecordMetadata metadata, Exception e) {
        if (e == null) {
            // Success!
            log.info("Received new metadata \n" +
                    "Topic:\t" + metadata.topic() + "\n" +
                    "Partition:\t" + metadata.partition() + "\n" +
                    "Offset:\t" + metadata.offset() + "\n" +
                    "Timestamp:\t" + metadata.timestamp());
        } else {
            // Something went wrong
            log.error("Error while producing", e);
        }
    }
});

What metadata can you get?

topic: Which topic the message was sent to
partition: Which partition number within that topic
offset: The position of this message in the partition
timestamp: When the message was created

Part 3: Understanding the Sticky Partitioner

This is where it gets really interesting. When I ran my producer sending 100 messages to a topic with 3 partitions, I noticed something odd: all messages went to partition 0.

Was my code broken? Not quite. I needed to understand the Sticky Partitioner.

What is the Sticky Partitioner?

Introduced in Kafka 2.4+, the UniformStickyPartitioner is the default partitioner when messages don't have a key.

How it works:

Messages "stick" to one partition until a batch is full
When the batch is sent, switch to a different partition
Goal: Improve performance by reducing network requests

When Does a Batch Get Sent?

A batch is sent when ANY of these conditions is met:

batch.size - Batch reaches configured size (in bytes)
linger.ms - Time limit reached (in milliseconds)
flush() - Manually forced

Common Misconception: batch.size is NOT time!

I initially thought batch.size was related to time. It's not!

batch.size = Size in bytes (default: 16384 bytes or 16 KB)
linger.ms = Time in milliseconds (default: 0)

Demonstrating Sticky Partitioner Behavior

To observe partition switching, we need to make batches fill up quickly:

// Make batches smaller so they fill up faster
properties.setProperty("batch.size", "400");  // 400 bytes instead of 16KB

Why 400?

Each message is roughly 15-20 bytes ("hello worldXX")
400 bytes ÷ 20 bytes ≈ 20 messages per batch
When a batch fills up → it's sent → switches to next partition

The Loop with Delay

for (int i = 0; i < 100; i++) {
    if (i % 10 == 0) {
        Thread.sleep(1000);  // Pause every 10 messages
    }

    ProducerRecord<String, String> producerRecord =
        new ProducerRecord<>("demo_java", "hello world" + i);

    producer.send(producerRecord, callback);
}

Expected Behavior

With 3 partitions and batch.size=400:

First ~20 messages → Partition 0
Next ~20 messages → Partition 1
Next ~20 messages → Partition 2
Cycle continues...

You can observe this in the callback logs showing the partition number!

Key Configuration Parameters

Here's a quick reference table:

Property	Description	Default	Unit
`bootstrap.servers`	Kafka broker address	None	-
`key.serializer`	Key serializer class	None	-
`value.serializer`	Value serializer class	None	-
`batch.size`	Max batch size	16384	bytes
`linger.ms`	Max wait time before sending	0	milliseconds

Keyed vs Keyless Messages

The partitioning behavior changes based on whether you specify a key:

Scenario	Partitioner Behavior
With key	Messages with the same key go to the same partition (hash-based)
Without key	Uses Sticky Partitioner (batch-based)

Troubleshooting Common Issues

Error: Invalid value null for configuration value.serializer

Cause: Typo in property name

❌ values.serializer (with 's')
✅ value.serializer (singular)

All Messages Going to Partition 0

Possible causes:

Topic only has 1 partition
batch.size is too large - all messages fit in one batch
Wrong delay logic (i/10 instead of i%10)

Solution:
Check your topic configuration:

kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic demo_java

Adjust batch size:

properties.setProperty("batch.size", "400");

Fix the delay logic:

if (i % 10 == 0) {  // Correct!
    Thread.sleep(1000);
}

Why Sticky Partitioner is Better Than Round-Robin

Old Way: Round-Robin (Pre-Kafka 2.4)

Each message goes to a different partition in sequence
100 messages = potentially 100 network requests
More overhead, more latency

New Way: Sticky Partitioner (Kafka 2.4+)

Messages stick to one partition until batch is full
100 messages = maybe 5 network requests (assuming 20 messages per batch)
Fewer network calls = better throughput

When to Use flush()

Good use cases for flush():

Financial transaction systems (need immediate confirmation)
Critical logging where every message must be guaranteed

Avoid flush() in:

Regular log aggregation
High-throughput data pipelines
Real-time analytics (slight delay is acceptable)

Remember: Kafka's automatic batching is designed for performance. Only override it when you have a specific requirement!

Key Takeaways

✅ Properties configuration is the foundation - get it right
✅ send() is async - use flush() or close() to ensure delivery
✅ Callbacks provide metadata - use them to track message status
✅ Sticky Partitioner batches by partition - improves performance
✅ batch.size is in bytes, not time - linger.ms is the time setting
✅ Keyless messages use Sticky Partitioner
✅ Always call close() - prevents resource leaks

Conclusion

Understanding how Kafka Producers work is fundamental to building robust event-driven systems. The key insights I gained were:

The asynchronous nature of send() and why close() is critical
How callbacks provide visibility into message delivery
The performance benefits of the Sticky Partitioner
The importance of proper configuration (especially batch.size vs linger.ms)

These concepts form the foundation for more advanced Kafka patterns like transactions, idempotent producers, and exactly-once semantics.

This article is part of my learning journey through Apache Kafka. If you found it helpful, please give it a like and follow for more Kafka tutorials!

Course Reference: Apache Kafka Series - Learn Apache Kafka for Beginners v3

DEV Community