Introduction
When I started learning Apache Kafka, one of the first questions I had was: "How exactly does a Producer work?" The documentation was thorough, but I wanted to understand it through practical examples.
In this article, I'll walk you through what I learned about Kafka Producers, from sending your first message to understanding the mysterious "Sticky Partitioner." We'll build two simple Java programs and explore some interesting behaviors along the way.
This guide is based on the excellent course "Apache Kafka Series - Learn Apache Kafka for Beginners v3".
Part 1: My First Kafka Producer
Let's start with the simplest possible Kafka Producer that sends a single message.
Setting Up Producer Properties
Every Kafka Producer needs configuration. We use Java's Properties class for this:
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "127.0.0.1:9092");
properties.setProperty("key.serializer", StringSerializer.class.getName());
properties.setProperty("value.serializer", StringSerializer.class.getName());
What's happening here?
-
bootstrap.servers: The address of your Kafka broker(s) -
key.serializer: Converts your key object into bytes for transmission -
value.serializer: Converts your value object into bytes
Note: java.util.Properties is similar to Python's dict, but it only stores string key-value pairs. It's part of the Java standard library, not specific to Kafka.
Creating the Producer
With properties configured, we can create our producer:
KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
The generic types <String, String> represent the types for key and value respectively.
Creating and Sending a Message
ProducerRecord<String, String> producerRecord =
new ProducerRecord<>("demo_java", "hello world");
producer.send(producerRecord);
Here, "demo_java" is the topic name, and "hello world" is our message. Notice we didn't specify a key - it defaults to null.
The Critical Trio: send(), flush(), close()
This is where things get interesting. Here's what each method does:
send() - Asynchronous Operation
producer.send(producerRecord);
- Asynchronous: The message goes into a buffer, it's NOT sent immediately
- If your program exits right after this, the message might never reach Kafka
flush() - Synchronous Operation
producer.flush();
- Synchronous: Forces all buffered messages to be sent and blocks until complete
- Useful for learning/demos to ensure messages are sent
- Rarely used in production because it impacts performance
close() - Cleanup
producer.close();
- Shuts down the Producer and releases resources
- Internally calls flush() to ensure all messages are sent
- MUST be called in production to prevent resource leaks
Here's the relationship:
send() flush() close()
↓ ↓ ↓
[Buffer] -----> [Send] -----> [Clean up]
(async) (sync) (includes flush)
Best Practice:
- Always call
close()in production - Avoid calling
flush()unless absolutely necessary - Let Kafka handle batching automatically for better performance
Part 2: Producer with Callbacks
Now let's level up and add callbacks to track message metadata.
Why Use Callbacks?
Callbacks let you know when a message is successfully sent (or if it failed):
producer.send(producerRecord, new Callback() {
@Override
public void onCompletion(RecordMetadata metadata, Exception e) {
if (e == null) {
// Success!
log.info("Received new metadata \n" +
"Topic:\t" + metadata.topic() + "\n" +
"Partition:\t" + metadata.partition() + "\n" +
"Offset:\t" + metadata.offset() + "\n" +
"Timestamp:\t" + metadata.timestamp());
} else {
// Something went wrong
log.error("Error while producing", e);
}
}
});
What metadata can you get?
-
topic: Which topic the message was sent to -
partition: Which partition number within that topic -
offset: The position of this message in the partition -
timestamp: When the message was created
Part 3: Understanding the Sticky Partitioner
This is where it gets really interesting. When I ran my producer sending 100 messages to a topic with 3 partitions, I noticed something odd: all messages went to partition 0.
Was my code broken? Not quite. I needed to understand the Sticky Partitioner.
What is the Sticky Partitioner?
Introduced in Kafka 2.4+, the UniformStickyPartitioner is the default partitioner when messages don't have a key.
How it works:
- Messages "stick" to one partition until a batch is full
- When the batch is sent, switch to a different partition
- Goal: Improve performance by reducing network requests
When Does a Batch Get Sent?
A batch is sent when ANY of these conditions is met:
-
batch.size- Batch reaches configured size (in bytes) -
linger.ms- Time limit reached (in milliseconds) -
flush()- Manually forced
Common Misconception: batch.size is NOT time!
I initially thought batch.size was related to time. It's not!
-
batch.size= Size in bytes (default: 16384 bytes or 16 KB) -
linger.ms= Time in milliseconds (default: 0)
Demonstrating Sticky Partitioner Behavior
To observe partition switching, we need to make batches fill up quickly:
// Make batches smaller so they fill up faster
properties.setProperty("batch.size", "400"); // 400 bytes instead of 16KB
Why 400?
- Each message is roughly 15-20 bytes ("hello worldXX")
- 400 bytes ÷ 20 bytes ≈ 20 messages per batch
- When a batch fills up → it's sent → switches to next partition
The Loop with Delay
for (int i = 0; i < 100; i++) {
if (i % 10 == 0) {
Thread.sleep(1000); // Pause every 10 messages
}
ProducerRecord<String, String> producerRecord =
new ProducerRecord<>("demo_java", "hello world" + i);
producer.send(producerRecord, callback);
}
Expected Behavior
With 3 partitions and batch.size=400:
- First ~20 messages → Partition 0
- Next ~20 messages → Partition 1
- Next ~20 messages → Partition 2
- Cycle continues...
You can observe this in the callback logs showing the partition number!
Key Configuration Parameters
Here's a quick reference table:
| Property | Description | Default | Unit |
|---|---|---|---|
bootstrap.servers |
Kafka broker address | None | - |
key.serializer |
Key serializer class | None | - |
value.serializer |
Value serializer class | None | - |
batch.size |
Max batch size | 16384 | bytes |
linger.ms |
Max wait time before sending | 0 | milliseconds |
Keyed vs Keyless Messages
The partitioning behavior changes based on whether you specify a key:
| Scenario | Partitioner Behavior |
|---|---|
| With key | Messages with the same key go to the same partition (hash-based) |
| Without key | Uses Sticky Partitioner (batch-based) |
Troubleshooting Common Issues
Error: Invalid value null for configuration value.serializer
Cause: Typo in property name
- ❌
values.serializer(with 's') - ✅
value.serializer(singular)
All Messages Going to Partition 0
Possible causes:
- Topic only has 1 partition
-
batch.sizeis too large - all messages fit in one batch - Wrong delay logic (
i/10instead ofi%10)
Solution:
Check your topic configuration:
kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic demo_java
Adjust batch size:
properties.setProperty("batch.size", "400");
Fix the delay logic:
if (i % 10 == 0) { // Correct!
Thread.sleep(1000);
}
Why Sticky Partitioner is Better Than Round-Robin
Old Way: Round-Robin (Pre-Kafka 2.4)
- Each message goes to a different partition in sequence
- 100 messages = potentially 100 network requests
- More overhead, more latency
New Way: Sticky Partitioner (Kafka 2.4+)
- Messages stick to one partition until batch is full
- 100 messages = maybe 5 network requests (assuming 20 messages per batch)
- Fewer network calls = better throughput
When to Use flush()
Good use cases for flush():
- Financial transaction systems (need immediate confirmation)
- Critical logging where every message must be guaranteed
Avoid flush() in:
- Regular log aggregation
- High-throughput data pipelines
- Real-time analytics (slight delay is acceptable)
Remember: Kafka's automatic batching is designed for performance. Only override it when you have a specific requirement!
Key Takeaways
- ✅ Properties configuration is the foundation - get it right
- ✅ send() is async - use flush() or close() to ensure delivery
- ✅ Callbacks provide metadata - use them to track message status
- ✅ Sticky Partitioner batches by partition - improves performance
- ✅ batch.size is in bytes, not time - linger.ms is the time setting
- ✅ Keyless messages use Sticky Partitioner
- ✅ Always call close() - prevents resource leaks
Conclusion
Understanding how Kafka Producers work is fundamental to building robust event-driven systems. The key insights I gained were:
- The asynchronous nature of
send()and whyclose()is critical - How callbacks provide visibility into message delivery
- The performance benefits of the Sticky Partitioner
- The importance of proper configuration (especially
batch.sizevslinger.ms)
These concepts form the foundation for more advanced Kafka patterns like transactions, idempotent producers, and exactly-once semantics.
This article is part of my learning journey through Apache Kafka. If you found it helpful, please give it a like and follow for more Kafka tutorials!
Course Reference: Apache Kafka Series - Learn Apache Kafka for Beginners v3
Top comments (0)