DEV Community

loading...

Solution: Chunked Iterator, Python Riddle

orenovadia
・2 min read

This is a detailed solution to this riddle.

Here it is again: write a function (chunks) where the input is an iterator. And the output is an iterator of n sized iterators.

The first challenge is that the length of the original iterator is unknown. Let's start with a naive broken solution using itertools.islice to create n size iterators without caring about the length of the original iterator:

import itertools
from typing import TypeVar, Iterator

T = TypeVar('T')


def chunks(iterator: Iterator[T], n: int) -> Iterator[Iterator[T]]:
    while True:
        yield itertools.islice(iterator, 0, n)  # yield `n` size chunk of the iterator


for chunk in chunks(iter(range(10)), 3):
    print(f'A {type(chunk)} type chunk, contains items: {tuple(chunk)}')

The output is:

A <class 'itertools.islice'> type chunk, contains items: (0, 1, 2)
A <class 'itertools.islice'> type chunk, contains items: (3, 4, 5)
A <class 'itertools.islice'> type chunk, contains items: (6, 7, 8)
A <class 'itertools.islice'> type chunk, contains items: (9,)
A <class 'itertools.islice'> type chunk, contains items: ()
A <class 'itertools.islice'> type chunk, contains items: ()
.... forever

Using itertools.islice we managed to chunk up the original iterator, but we don't know when it is exhausted. And so, chunks is a generator function that never ends.

To overcome this problem we need to take one item out of the original iterator. Then, we'll use itertools.chain to create a chunk featuring this one item and n-1 more items.


def chunks(iterator: Iterator[T], n: int) -> Iterator[Iterator[T]]:
    for first in iterator: # take one item out (exits loop if `iterator` is empty)
        rest_of_chunk = itertools.islice(iterator, 0, n - 1)
        yield itertools.chain([first], rest_of_chunk)  # concatenate the first item back

The output is:

A <class 'itertools.chain'> type chunk, contains items: (0, 1, 2)
A <class 'itertools.chain'> type chunk, contains items: (3, 4, 5)
A <class 'itertools.chain'> type chunk, contains items: (6, 7, 8)
A <class 'itertools.chain'> type chunk, contains items: (9,)

And That is it!

And in comprehension form (I usually prefer comprehensions, but I must admit that this one is not as readable as the loop form):

    return (itertools.chain([first], itertools.islice(iterator, 0, n - 1))
            for first in iterator)

Performance

The runtime overhead of islice-ing and chain-ing is way smaller than writing your own custom code for chunks since both these functions are implemented in C (for the CPython runtime).

The Caveat

There is a caveat here: This whole solution assumes that the consumer of chunks is consuming the iterators fully and in order. If that is not the case, the order of items in our chunks might not be consistent with the original iterator, due to the laziness of chunks.

Example:

chunk_iterator = chunks(iter(range(10)), 3)
first_chunk = next(chunk_iterator)
second_chunk = next(chunk_iterator)
print(tuple(second_chunk))  # (1, 2, 3)
print(tuple(first_chunk))  # (0, 4, 5)

Cheers!

Discussion (0)