DEV Community

Byron Hsieh
Byron Hsieh

Posted on

Kafka Producer Deep Dive: From Basics to Production-Ready Configuration

Introduction

When I started learning Apache Kafka, the Producer seemed simple at first - just call send() and you're done, right?

Wrong.

As I dug deeper, I discovered a rich set of configurations that determine whether your messages are delivered reliably or potentially lost. Understanding these concepts transformed how I think about building data pipelines.

In this article, I'll walk you through everything I learned about Kafka Producers, from the basics to production-ready configurations.

This guide is based on the excellent course "Apache Kafka Series - Learn Apache Kafka for Beginners v3".


Table of Contents

  1. Producer Basics
  2. The Sticky Partitioner
  3. Producer Acknowledgements (acks)
  4. Producer Retries
  5. Idempotent Producer
  6. Production-Ready Configuration

1. Producer Basics

Setting Up a Producer

Every Kafka Producer starts with configuration:

Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "127.0.0.1:9092");
properties.setProperty("key.serializer", StringSerializer.class.getName());
properties.setProperty("value.serializer", StringSerializer.class.getName());

KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
Enter fullscreen mode Exit fullscreen mode

The Critical Trio: send(), flush(), close()

This is where things get interesting:

send() - Asynchronous Operation

producer.send(record);
Enter fullscreen mode Exit fullscreen mode
  • Asynchronous: The message goes into a buffer, NOT sent immediately
  • If your program exits right after this, the message might never reach Kafka

flush() - Synchronous Operation

producer.flush();
Enter fullscreen mode Exit fullscreen mode
  • Synchronous: Forces all buffered messages to be sent and blocks until complete
  • Useful for learning/demos, but rarely used in production (impacts performance)

close() - Cleanup

producer.close();
Enter fullscreen mode Exit fullscreen mode
  • Shuts down the Producer and releases resources
  • Internally calls flush() - ensures all messages are sent
  • MUST be called in production to prevent resource leaks
send()          flush()         close()
  ↓               ↓               ↓
[Buffer] -----> [Send] -----> [Clean up]
(async)        (sync)         (includes flush)
Enter fullscreen mode Exit fullscreen mode

Callbacks for Monitoring

Callbacks let you track message delivery:

producer.send(record, (metadata, exception) -> {
    if (exception == null) {
        log.info("Sent to topic: " + metadata.topic() +
                " partition: " + metadata.partition() +
                " offset: " + metadata.offset());
    } else {
        log.error("Error while producing", exception);
    }
});
Enter fullscreen mode Exit fullscreen mode

What you can get from metadata:

  • topic: Which topic the message was sent to
  • partition: Which partition number
  • offset: The position of this message in the partition
  • timestamp: When the message was created

2. The Sticky Partitioner

When I ran my producer sending 100 messages to a topic with 3 partitions, I noticed something odd: all messages went to partition 0.

Was my code broken? Not quite. This is the Sticky Partitioner at work.

How It Works

Introduced in Kafka 2.4+, the Sticky Partitioner is the default when messages don't have a key:

  • Messages "stick" to one partition until a batch is full
  • When the batch is sent, switch to a different partition
  • Goal: Reduce network requests by batching more efficiently

When Does a Batch Get Sent?

A batch is sent when ANY of these conditions is met:

Trigger Setting Default
Size limit batch.size 16384 bytes (16 KB)
Time limit linger.ms 0 ms
Manual flush() -

Common Misconception

batch.size is in BYTES, not message count!

// Make batches smaller to observe partition switching
properties.setProperty("batch.size", "400");  // 400 bytes
Enter fullscreen mode Exit fullscreen mode

Why Sticky is Better Than Round-Robin

Old Way (Pre-Kafka 2.4):

  • Each message goes to a different partition
  • 100 messages = potentially 100 network requests

New Way (Sticky Partitioner):

  • Messages batch together per partition
  • 100 messages = maybe 5 network requests
  • Fewer network calls = better throughput

3. Producer Acknowledgements (acks)

This is one of the most important producer configurations.

The Three Levels

Setting Behavior Data Loss Risk
acks=0 Fire and forget - don't wait for any confirmation High
acks=1 Wait for leader broker only Medium
acks=all Wait for leader + all in-sync replicas None

acks=0 (Fire and Forget)

Producer → [send] → Broker
              ↓
         (don't wait)
              ↓
         [continue]
Enter fullscreen mode Exit fullscreen mode
  • Producer doesn't wait for any acknowledgement
  • Highest throughput, but data loss is possible
  • Use case: Metrics collection where some loss is acceptable

acks=1 (Leader Only)

Producer → [send] → Leader Broker → [commit]
                          ↓
                    [send ack back]
                          ↓
                 Replicas sync later (background)
Enter fullscreen mode Exit fullscreen mode
  • Producer waits for leader to acknowledge
  • Problem: If leader fails before replication, data is lost
  • Was the default from Kafka 1.0 to 2.8

acks=all (Full Replication)

Producer → [send] → Leader → Replica 1 → [ack]
                      ↓           ↓
                  Replica 2 → [ack]
                      ↓
              [all acks received]
                      ↓
              [send ack to producer]
Enter fullscreen mode Exit fullscreen mode
  • Producer waits for leader AND all in-sync replicas (ISR)
  • No data loss (with proper configuration)
  • Default since Kafka 3.0

The min.insync.replicas Setting

acks=all works together with min.insync.replicas:

Replication Factor = 3
min.insync.replicas = 2

Scenario: Only 1 broker available
Result: Producer receives NOT_ENOUGH_REPLICAS exception
        → Better to fail than to lose data!
Enter fullscreen mode Exit fullscreen mode

Best practice formula:

Brokers that can fail = Replication Factor - min.insync.replicas

Example: RF=3, min.insync=2 → Can tolerate 1 broker failure
Enter fullscreen mode Exit fullscreen mode

4. Producer Retries

The Retry Mechanism

When sending fails, Kafka retries automatically:

|<------------ delivery.timeout.ms (default: 2 min) ------------>|
|                                                                  |
send() → [batch] → [send] → [fail] → [wait] → [retry] → [success]
                              ↑         ↑
                         network    retry.backoff.ms
                          error      (default: 100ms)
Enter fullscreen mode Exit fullscreen mode

Key Settings

Setting Default Description
retries 2147483647 (Kafka 2.1+) Max retry attempts
retry.backoff.ms 100 Wait time between retries
delivery.timeout.ms 120000 (2 min) Upper bound for total delivery time

Understanding delivery.timeout.ms

This is the most important timeout setting:

// Total time from send() to success or failure
properties.setProperty("delivery.timeout.ms", "120000"); // 2 minutes
Enter fullscreen mode Exit fullscreen mode

What happens when timeout is reached:

producer.send(record, (metadata, exception) -> {
    if (exception instanceof TimeoutException) {
        // delivery.timeout.ms exceeded
        // Message was NOT delivered - handle it!
    }
});
Enter fullscreen mode Exit fullscreen mode

The Out-of-Order Problem

With retries enabled and max.in.flight.requests.per.connection > 1:

Timeline:
1. Send batch A (messages 1-10)
2. Send batch B (messages 11-20)
3. Batch A fails (network error)
4. Batch B succeeds ← commits first!
5. Batch A retries and succeeds

Result in Kafka: [11-20, 1-10]  ← Out of order!
Enter fullscreen mode Exit fullscreen mode

Solution: Use Idempotent Producer (next section)


5. Idempotent Producer

The Duplicate Problem

Without idempotence, network errors can cause duplicates:

1. Producer sends message
2. Kafka commits message to log
3. Kafka sends ack
4. Ack is LOST (network error)
5. Producer thinks it failed → retries
6. Kafka commits AGAIN → Duplicate!
Enter fullscreen mode Exit fullscreen mode

The Solution

properties.setProperty("enable.idempotence", "true");
Enter fullscreen mode Exit fullscreen mode

How It Works

Each producer gets a Producer ID (PID) and each message batch gets a sequence number:

Producer (PID=1) sends:
  Batch 1: seq=0 → Kafka commits
  Batch 1: seq=0 → Kafka says "already have seq=0, ignoring"

Result: No duplicates!
Enter fullscreen mode Exit fullscreen mode

What Idempotence Automatically Sets

When you enable idempotence, Kafka automatically configures:

Setting Value Why
acks all Need confirmation from all replicas
retries MAX_VALUE Retry until delivery.timeout.ms
max.in.flight.requests.per.connection 5 With ordering guaranteed!

The Magic of Ordering with max.in.flight=5

You might wonder: "How can we have 5 in-flight requests AND maintain order?"

Answer: Kafka uses sequence numbers to detect out-of-order batches and rejects them, forcing the producer to retry in correct order.

Batch A (seq=0) fails, Batch B (seq=1) arrives first
Kafka: "I expected seq=0, got seq=1 - rejecting!"
Producer retries Batch A, then Batch B
Order maintained!
Enter fullscreen mode Exit fullscreen mode

Default Since Kafka 3.0

Idempotent producer is enabled by default in Kafka 3.0+, but explicitly enable it for older versions or for clarity.


6. Production-Ready Configuration

Safe Producer Settings

Properties properties = new Properties();

// Connection
properties.setProperty("bootstrap.servers", "broker1:9092,broker2:9092,broker3:9092");
properties.setProperty("key.serializer", StringSerializer.class.getName());
properties.setProperty("value.serializer", StringSerializer.class.getName());

// Reliability (Kafka 3.0+ defaults, but explicit is better)
properties.setProperty("enable.idempotence", "true");
properties.setProperty("acks", "all");
properties.setProperty("retries", String.valueOf(Integer.MAX_VALUE));
properties.setProperty("max.in.flight.requests.per.connection", "5");

// Timeouts
properties.setProperty("delivery.timeout.ms", "120000");  // 2 minutes
properties.setProperty("request.timeout.ms", "30000");    // 30 seconds
Enter fullscreen mode Exit fullscreen mode

High Throughput Settings (Optional)

// Batching - wait a bit to collect more messages
properties.setProperty("linger.ms", "20");
properties.setProperty("batch.size", String.valueOf(32 * 1024)); // 32KB

// Compression - reduce network bandwidth
properties.setProperty("compression.type", "snappy");  // or "lz4", "zstd"
Enter fullscreen mode Exit fullscreen mode

Configuration Cheat Sheet

┌─────────────────────────────────────────────────────────────┐
│                    PRODUCER SETTINGS                        │
├─────────────────────────────────────────────────────────────┤
│  RELIABILITY                                                │
│  ├── enable.idempotence = true     (prevent duplicates)    │
│  ├── acks = all                    (full replication)      │
│  └── retries = MAX_VALUE           (bounded by timeout)    │
├─────────────────────────────────────────────────────────────┤
│  TIMEOUTS                                                   │
│  ├── delivery.timeout.ms = 120000  (total time bound)      │
│  ├── request.timeout.ms = 30000    (per request)           │
│  └── retry.backoff.ms = 100        (between retries)       │
├─────────────────────────────────────────────────────────────┤
│  THROUGHPUT                                                 │
│  ├── linger.ms = 20                (batch collection time) │
│  ├── batch.size = 32768            (batch size in bytes)   │
│  └── compression.type = snappy     (reduce network I/O)    │
├─────────────────────────────────────────────────────────────┤
│  BROKER/TOPIC (not producer settings)                       │
│  ├── replication.factor = 3                                 │
│  └── min.insync.replicas = 2                               │
└─────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

  1. send() is asynchronous - always call close() to ensure delivery

  2. Sticky Partitioner batches messages by partition for better throughput

  3. acks=all + min.insync.replicas=2 = no data loss (with RF=3)

  4. delivery.timeout.ms is the upper bound for all retries

  5. Idempotent Producer prevents duplicates AND maintains ordering

  6. Default since Kafka 3.0 - idempotence is on, but be explicit


Conclusion

The Kafka Producer is deceptively simple on the surface but incredibly powerful when you understand its internals:

  • Batching improves throughput
  • Acknowledgements control durability
  • Retries handle transient failures
  • Idempotence prevents duplicates

The key insight: Kafka's defaults have gotten much better over time. In Kafka 3.0+, you get idempotent, exactly-once producer semantics out of the box. But understanding why these settings matter helps you tune them for your specific use case.


This article is part of my learning journey through Apache Kafka. If you found it helpful, please give it a like and follow for more tutorials!

Course Reference: Apache Kafka Series - Learn Apache Kafka for Beginners v3

Top comments (0)