The Python standard library ships with a module that most beginners skip right past: itertools. Once you start processing real data — CSV exports, API responses, log files, ebook chapters — you will reach for it constantly. The core idea is lazy iteration: instead of building a giant list in memory, itertools functions produce one item at a time, on demand.
🎁 Free: AI Publishing Checklist — 7 steps in Python · Full pipeline: germy5.gumroad.com/l/xhxkzz (pay what you want, min $9.99)
What itertools Gives You
itertools is built into Python — no pip install needed. Every function in it returns an iterator, not a list. That means:
- Memory stays flat no matter how large the input is.
- You can chain operations together without intermediate allocations.
- It composes cleanly with generators,
map, andfilter.
Import it once at the top of your script:
import itertools
chain() — Concatenate Iterables Without Building a List
Suppose you have multiple lists of chapter titles pulled from different sources and you want to process them in one pass.
Before:
all_titles = chapter_titles + bonus_titles + appendix_titles
for title in all_titles:
process(title)
After:
for title in itertools.chain(chapter_titles, bonus_titles, appendix_titles):
process(title)
The chain version never builds a combined list. If each source has thousands of items, that matters. You can also flatten a list-of-lists with chain.from_iterable:
nested = [["ch1", "ch2"], ["bonus1"], ["app1", "app2"]]
for title in itertools.chain.from_iterable(nested):
print(title)
islice() — Take the First N Items from Any Iterable
islice is the slice operator for iterators. Generators do not support [0:10] syntax, but islice handles them cleanly.
import itertools
def load_rows(filename):
with open(filename) as f:
for line in f:
yield line.strip()
# Preview first 5 rows without loading the whole file
for row in itertools.islice(load_rows("data.csv"), 5):
print(row)
You can also skip rows with a start offset: islice(iterable, start, stop).
batched() — Process in Chunks of N
Added in Python 3.12, itertools.batched splits an iterable into fixed-size chunks. This is the pattern for rate-limited API calls, database batch inserts, or processing ebook pages in groups.
# Python 3.12+
from itertools import batched
results = range(100)
for batch in batched(results, 10):
print(batch) # tuple of up to 10 items
For Python 3.11 and earlier, implement the same pattern manually:
def batched_compat(iterable, n):
it = iter(iterable)
while chunk := list(itertools.islice(it, n)):
yield chunk
Both versions yield tuples (or lists) of at most n items, with the final chunk smaller if the total does not divide evenly.
groupby() — Group Sorted Data by a Key
groupby scans an iterable and emits (key, group_iterator) pairs. The critical rule: the input must be sorted by the same key first, otherwise groups will be fragmented.
import itertools, csv
rows = [
{"category": "python", "title": "Generators"},
{"category": "python", "title": "Decorators"},
{"category": "bash", "title": "Cron Jobs"},
{"category": "bash", "title": "Pipelines"},
]
rows.sort(key=lambda r: r["category"])
for category, group in itertools.groupby(rows, key=lambda r: r["category"]):
titles = [r["title"] for r in group]
print(f"{category}: {titles}")
Output:
bash: ['Cron Jobs', 'Pipelines']
python: ['Generators', 'Decorators']
This pattern maps directly onto grouping log lines by severity, CSV rows by date, or API records by status code.
product() — Cartesian Product Replaces Nested Loops
Nested for loops scanning a grid of options are a common source of hard-to-read code. product collapses them:
Before:
for size in ["small", "medium", "large"]:
for color in ["red", "blue"]:
for format_ in ["epub", "pdf"]:
generate(size, color, format_)
After:
for size, color, format_ in itertools.product(
["small", "medium", "large"],
["red", "blue"],
["epub", "pdf"],
):
generate(size, color, format_)
Same output, one indentation level, easier to extend. The repeat parameter lets you take the cartesian power of a single iterable: product("ABC", repeat=2) produces all two-letter pairs.
combinations() and permutations()
When you need to test all pairs from a set of values, or all ordered arrangements:
tags = ["python", "automation", "publishing"]
# All 2-tag pairs (order does not matter)
for pair in itertools.combinations(tags, 2):
print(pair)
# ('python', 'automation'), ('python', 'publishing'), ('automation', 'publishing')
# All ordered arrangements of 2 tags
for pair in itertools.permutations(tags, 2):
print(pair)
# ('python', 'automation'), ('python', 'publishing'), ('automation', 'python'), ...
combinations_with_replacement allows repeated elements. These are useful for generating test cases, keyword combinations for SEO experiments, or cover layout variants.
repeat() and cycle() — Generating Test Data
repeat(value, n) yields the same value n times (or infinitely if n is omitted). cycle(iterable) loops through a sequence forever. Both are useful for pairing a static value with a stream:
# Tag every row with the same pipeline version
rows = ["row1", "row2", "row3"]
for row, version in zip(rows, itertools.repeat("v4.0")):
print(row, version)
# Rotate through cover styles when generating previews
styles = itertools.cycle(["dark", "light", "minimal"])
for i in range(9):
print(f"Cover {i}: {next(styles)}")
Combine repeat with chain to inject a separator between sections of output without building an intermediate list.
Real Pipeline Pattern: Batching 100 API Results
Here is how these tools combine in a realistic script — fetching 100 records from an API and processing them 10 at a time to stay under rate limits:
import itertools
import time
def fetch_all_records():
"""Generator that yields all 100 records lazily."""
for i in range(100):
yield {"id": i, "title": f"Chapter {i}"}
def process_batch(batch):
"""Send a batch to an external API or write to disk."""
ids = [r["id"] for r in batch]
print(f"Processing batch: {ids}")
time.sleep(0.1) # simulate rate limit pause
# Python 3.12+
from itertools import batched
for batch in batched(fetch_all_records(), 10):
process_batch(batch)
# Python 3.11 and earlier
def batched_compat(iterable, n):
it = iter(iterable)
while chunk := list(itertools.islice(it, n)):
yield chunk
for batch in batched_compat(fetch_all_records(), 10):
process_batch(batch)
At no point does this script hold all 100 records in memory. The generator, the batch wrapper, and the processing function form a lazy pipeline — exactly the pattern used in the full ebook publishing pipeline where thousands of metadata rows flow through validation, transformation, and upload stages.
If you are building a document or content pipeline and want to apply this batch-processing approach at scale — including API calls, file generation, and KDP/Gumroad uploads — the full pipeline guide covers every stage end-to-end: germy5.gumroad.com/l/xhxkzz.
Further Reading
- Python Generators and yield: Lazy Sequences That Scale
- Python List Comprehensions: From Loops to One-Liners
- Python collections: defaultdict, Counter, deque, and namedtuple
If this was useful, the ❤️ button helps other developers find it.
Building a Python content pipeline? I sell the complete automation system as a one-time download — Dev.to API, Claude API, launchd, Gumroad. Check it out ($9.99)
Top comments (0)