German Yamil

Posted on May 18

Python itertools: Efficient Loops Without the Boilerplate

#python #tutorial #codenewbie #beginners

The Python standard library ships with a module that most beginners skip right past: itertools. Once you start processing real data — CSV exports, API responses, log files, ebook chapters — you will reach for it constantly. The core idea is lazy iteration: instead of building a giant list in memory, itertools functions produce one item at a time, on demand.

🎁 Free: AI Publishing Checklist — 7 steps in Python · Full pipeline: germy5.gumroad.com/l/xhxkzz (pay what you want, min $9.99)

What itertools Gives You

itertools is built into Python — no pip install needed. Every function in it returns an iterator, not a list. That means:

Memory stays flat no matter how large the input is.
You can chain operations together without intermediate allocations.
It composes cleanly with generators, map, and filter.

Import it once at the top of your script:

import itertools

chain() — Concatenate Iterables Without Building a List

Suppose you have multiple lists of chapter titles pulled from different sources and you want to process them in one pass.

Before:

all_titles = chapter_titles + bonus_titles + appendix_titles
for title in all_titles:
    process(title)

After:

for title in itertools.chain(chapter_titles, bonus_titles, appendix_titles):
    process(title)

The chain version never builds a combined list. If each source has thousands of items, that matters. You can also flatten a list-of-lists with chain.from_iterable:

nested = [["ch1", "ch2"], ["bonus1"], ["app1", "app2"]]
for title in itertools.chain.from_iterable(nested):
    print(title)

islice() — Take the First N Items from Any Iterable

islice is the slice operator for iterators. Generators do not support [0:10] syntax, but islice handles them cleanly.

import itertools

def load_rows(filename):
    with open(filename) as f:
        for line in f:
            yield line.strip()

# Preview first 5 rows without loading the whole file
for row in itertools.islice(load_rows("data.csv"), 5):
    print(row)

You can also skip rows with a start offset: islice(iterable, start, stop).

batched() — Process in Chunks of N

Added in Python 3.12, itertools.batched splits an iterable into fixed-size chunks. This is the pattern for rate-limited API calls, database batch inserts, or processing ebook pages in groups.

# Python 3.12+
from itertools import batched

results = range(100)
for batch in batched(results, 10):
    print(batch)  # tuple of up to 10 items

For Python 3.11 and earlier, implement the same pattern manually:

def batched_compat(iterable, n):
    it = iter(iterable)
    while chunk := list(itertools.islice(it, n)):
        yield chunk

Both versions yield tuples (or lists) of at most n items, with the final chunk smaller if the total does not divide evenly.

groupby() — Group Sorted Data by a Key

groupby scans an iterable and emits (key, group_iterator) pairs. The critical rule: the input must be sorted by the same key first, otherwise groups will be fragmented.

import itertools, csv

rows = [
    {"category": "python", "title": "Generators"},
    {"category": "python", "title": "Decorators"},
    {"category": "bash",   "title": "Cron Jobs"},
    {"category": "bash",   "title": "Pipelines"},
]

rows.sort(key=lambda r: r["category"])

for category, group in itertools.groupby(rows, key=lambda r: r["category"]):
    titles = [r["title"] for r in group]
    print(f"{category}: {titles}")

Output:

bash: ['Cron Jobs', 'Pipelines']
python: ['Generators', 'Decorators']

This pattern maps directly onto grouping log lines by severity, CSV rows by date, or API records by status code.

product() — Cartesian Product Replaces Nested Loops

Nested for loops scanning a grid of options are a common source of hard-to-read code. product collapses them:

Before:

for size in ["small", "medium", "large"]:
    for color in ["red", "blue"]:
        for format_ in ["epub", "pdf"]:
            generate(size, color, format_)

After:

for size, color, format_ in itertools.product(
    ["small", "medium", "large"],
    ["red", "blue"],
    ["epub", "pdf"],
):
    generate(size, color, format_)

Same output, one indentation level, easier to extend. The repeat parameter lets you take the cartesian power of a single iterable: product("ABC", repeat=2) produces all two-letter pairs.

combinations() and permutations()

When you need to test all pairs from a set of values, or all ordered arrangements:

tags = ["python", "automation", "publishing"]

# All 2-tag pairs (order does not matter)
for pair in itertools.combinations(tags, 2):
    print(pair)
# ('python', 'automation'), ('python', 'publishing'), ('automation', 'publishing')

# All ordered arrangements of 2 tags
for pair in itertools.permutations(tags, 2):
    print(pair)
# ('python', 'automation'), ('python', 'publishing'), ('automation', 'python'), ...

combinations_with_replacement allows repeated elements. These are useful for generating test cases, keyword combinations for SEO experiments, or cover layout variants.

repeat() and cycle() — Generating Test Data

repeat(value, n) yields the same value n times (or infinitely if n is omitted). cycle(iterable) loops through a sequence forever. Both are useful for pairing a static value with a stream:

# Tag every row with the same pipeline version
rows = ["row1", "row2", "row3"]
for row, version in zip(rows, itertools.repeat("v4.0")):
    print(row, version)

# Rotate through cover styles when generating previews
styles = itertools.cycle(["dark", "light", "minimal"])
for i in range(9):
    print(f"Cover {i}: {next(styles)}")

Combine repeat with chain to inject a separator between sections of output without building an intermediate list.

Real Pipeline Pattern: Batching 100 API Results

Here is how these tools combine in a realistic script — fetching 100 records from an API and processing them 10 at a time to stay under rate limits:

import itertools
import time

def fetch_all_records():
    """Generator that yields all 100 records lazily."""
    for i in range(100):
        yield {"id": i, "title": f"Chapter {i}"}

def process_batch(batch):
    """Send a batch to an external API or write to disk."""
    ids = [r["id"] for r in batch]
    print(f"Processing batch: {ids}")
    time.sleep(0.1)  # simulate rate limit pause

# Python 3.12+
from itertools import batched
for batch in batched(fetch_all_records(), 10):
    process_batch(batch)

# Python 3.11 and earlier
def batched_compat(iterable, n):
    it = iter(iterable)
    while chunk := list(itertools.islice(it, n)):
        yield chunk

for batch in batched_compat(fetch_all_records(), 10):
    process_batch(batch)

At no point does this script hold all 100 records in memory. The generator, the batch wrapper, and the processing function form a lazy pipeline — exactly the pattern used in the full ebook publishing pipeline where thousands of metadata rows flow through validation, transformation, and upload stages.

If you are building a document or content pipeline and want to apply this batch-processing approach at scale — including API calls, file generation, and KDP/Gumroad uploads — the full pipeline guide covers every stage end-to-end: germy5.gumroad.com/l/xhxkzz.

DEV Community