DEV Community

Cover image for What does 'batching' mean when we're talking about Apache KafkaⓇ?
Lucia Cerchie
Lucia Cerchie

Posted on

8

What does 'batching' mean when we're talking about Apache KafkaⓇ?

Today I learned that when you hear the word 'batch' in the context of Apache Kafka, it can mean one of two things:

  1. A reference to batch-only data processing systems. Batch-only systems process data in a bounded way. That means that there's a start time and an end-time. Whether this batching is done in large or micro-batches, it is processed all at once. That's in contrast to the continuous data streaming that Apache Kafka enables, in which data is processed in event-sized pieces.

  2. Within the data streaming context, there's something called producer batching. It's a bit of a misnomer because it's not really related to the batch-only data processing systems. A Kafka producer, the client that publishes records to the Kafka cluster, compresses messages via a process called batching to increase throughput. This batching is part of the process handling data at once and in event-sized pieces, so it doesn't mean the same thing as batch-only data processing.

In conclusion, 'batching' means, in a very general way, 'grouping stuff together'. But 'producer batching' and 'batch-only data processing systems' do not share the term in any significant sense, because they are referring to the completely different functions I described above.

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs