DEV Community

Cover image for Chunking Java Streams the Right Way — A Collector That Feels Like It Should Be in the JDK
Zach Maddox
Zach Maddox

Posted on

Chunking Java Streams the Right Way — A Collector That Feels Like It Should Be in the JDK

Chunking Java Streams the Right Way — Finally, a Collector That Feels Like It Should Be in the JDK

If you've ever needed to split a large list or stream into evenly sized chunks, you already know the pain:

  • You write a loop.
  • Or worse, nested loops.
  • Maybe a counter.
  • Maybe a temporary list.
  • Maybe something that almost works until one edge case blows it up.

Chunking elements is one of those everyday operations that somehow never made it into the JDK. So developers keep rewriting the same utility method in every project… slightly different each time.

After doing this one too many times—and hitting a PostgreSQL driver limitation that forced me to batch thousands of UUIDs into smaller chunks—I finally decided:

This should be a Collector.
Clean. Composable. Built for Streams.

So I built one.

This is Chunking Collector — a lightweight Java 8+ library that lets you express chunking in a way that reads like it belongs in the standard library.


🔥 The Old Way: Manual Chunking (A Bit of a Mess)

Here’s what most of us end up writing:

List<List<T>> chunks = new ArrayList<>();
List<T> current = new ArrayList<>();

for (T item : items) {
    current.add(item);
    if (current.size() == chunkSize) {
        chunks.add(current);
        current = new ArrayList<>();
    }
}

if (!current.isEmpty()) {
    chunks.add(current);
}
Enter fullscreen mode Exit fullscreen mode

It works… until it doesn’t:

  • Harder to read
  • Easy to get wrong
  • Not reusable
  • Not parallel-friendly
  • Not stream-friendly

It also breaks the flow of code that naturally wants to be expressed as a Stream pipeline.


✨ The New Way: A Collector That Just Works

With Chunking Collector, you simply write:

List<List<Integer>> chunks = numbers.stream()
    .collect(Chunking.toChunks(3));
Enter fullscreen mode Exit fullscreen mode

Output:

[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]
Enter fullscreen mode Exit fullscreen mode

That’s it.
Readable. Safe. Predictable.

This is how chunking should feel.


🧩 Why a Collector?

Because chunking is fundamentally a reduction operation:

  • A Stream goes in
  • A List of Lists comes out
  • No side effects
  • No mutation leaking out
  • Works naturally with ordered, parallel, or sequential streams

And importantly, this fits the Stream philosophy perfectly:

stream.collect(Chunking.toChunks(size));
Enter fullscreen mode Exit fullscreen mode

You immediately know what it does.


📦 Installation

<dependency>
  <groupId>dev.zachmaddox</groupId>
  <artifactId>chunking-collector</artifactId>
  <version>1.1.0</version>
</dependency>
Enter fullscreen mode Exit fullscreen mode

or:

implementation 'dev.zachmaddox:chunking-collector:1.1.0'
Enter fullscreen mode Exit fullscreen mode

🧠 Practical Examples That Come Up All the Time

1. Batch Processing

Chunking.chunk(records, 100)
    .forEach(batch -> processBatch(batch));
Enter fullscreen mode Exit fullscreen mode

2. Database Paging

var pages = results.stream()
    .collect(Chunking.toChunks(500));
Enter fullscreen mode Exit fullscreen mode

3. Parallel Workloads

Chunking.chunk(items, 10)
    .parallelStream()
    .forEach(this::processChunk);
Enter fullscreen mode Exit fullscreen mode

🔥 The Real Origin: Working Around PostgreSQL IN-Clause Limits

PostgreSQL (and many JDBC drivers) limit how large an argument list can be in a single SQL statement.

Chunking solves this cleanly and safely using parameterized SQL:

NamedParameterJdbcTemplate named = new NamedParameterJdbcTemplate(jdbcTemplate);

Chunking.chunk(ids, 500)
    .parallelStream()
    .map(chunk -> named.query(
        "SELECT * FROM users WHERE id IN (:ids)",
        Map.of("ids", chunk),
        (rs, n) -> mapRow(rs)
    ))
    .flatMap(List::stream)
    .toList();
Enter fullscreen mode Exit fullscreen mode

The result:

  • No driver errors
  • Smaller, faster queries
  • Clear, maintainable code
  • Parallelizable workloads

This alone justified building the library.


⚡ Advanced Capabilities (When You Need Them)

Chunking Collector has grown into a flexible toolkit:

  • Remainder policies (INCLUDE_PARTIAL, DROP_PARTIAL)
  • Custom list factories
  • Lazy chunk streaming
  • Sliding windows
  • Boundary-based chunking
  • Weighted chunking
  • Primitive stream helpers

But the core API remains dead simple.


🧩 Design Philosophy

“If this API ever became part of the JDK, nobody should be surprised.”

So:

  • No dependencies
  • No reflection
  • No magic
  • Just clean Java
  • Very small surface area
  • Behaves exactly how experienced Java devs expect

📚 Full Documentation

JavaDoc: https://zmad5306.github.io/chunking-collector/latest/
GitHub: https://github.com/zmad5306/chunking-collector
Maven Central: https://central.sonatype.com/artifact/dev.zachmaddox/chunking-collector


🎉 Final Thoughts

Chunking is a universal problem, and now there's finally a clean, reusable, stream-friendly solution for it.

If you’ve ever thought:

“Why isn’t there just a built-in way to do this?”

Well… now there is.

Give it a try, star the repo, and drop feedback — I’d love to hear how you’re using it.

Top comments (0)