DEV Community

Timevolt
Timevolt

Posted on

List comprehensions, generators, and the walrus: what most Python devs overlook

List comprehensions, generators, and the walrus: what most Python devs overlook

Quick context (why you're writing this)

I was recently asked to speed up a nightly ETL job that read a 2 GB log file, pulled out timestamps, converted them to datetime objects, and then filtered for business‑hours entries. My first pass looked clean: a big list comprehension that built a list of all the parsed datetimes, then I fed that list into sum/filter/whatever else I needed. The job ran for … well, it never finished. My laptop started swapping, the OOM killer showed up, and I spent three hours staring at a memory profile before I realized the culprit was that eager list.

That “oh‑yeah‑me‑too” moment reminded me how easy it is to reach for a list comprehension because it looks neat, while forgetting that Python also gives us a lazy counterpart that can save both memory and time when we don’t actually need the whole collection at once. There are a couple of lesser‑known tricks tucked inside comprehensions and generator expressions that many developers miss entirely—sometimes to their detriment. Let’s walk through them.

The Insight

List comprehensions are eager; generator expressions are lazy. That’s the headline, but the real power shows up when you combine laziness with a few Python features that let you avoid repeated work or unnecessary allocations.

  1. The walrus operator (:=) inside a comprehension lets you capture an intermediate result without recomputing it. It’s a tiny syntactic sugar that can cut CPU work in half for expensive calls.
  2. Generator expressions can be dropped straight into any built‑in that accepts an iterable (sum, any, all, max, min, itertools.chain, etc.). Wrapping them in list() is a common, wasteful habit.
  3. Generators are single‑use. Once you’ve exhausted them, they’re empty forever—a gotcha that bites when you try to reuse the same expression for multiple passes.

Understanding these points helps you write code that scales, reads clearer, and avoids sneaky performance pits.

How (with code)

1. Avoiding duplicate work with the walrus

Suppose we need to keep only those log lines whose timestamp falls after 9 AM. Parsing the line into a datetime is relatively cheap, but imagine it was a costly regex or a database lookup. Without the walrus you might end up calling the parser twice: once for the filter, once for the value you actually want to store.

# ❌ Typical mistake: parsing twice
from datetime import datetime

filtered = [
    dt
    for line in open('server.log')
    if line.startswith('2023-09')
    for dt in [datetime.strptime(line.split()[1], "%H:%M:%S")]  # <-- extra iterable just to reuse dt
    if dt.hour >= 9
]
Enter fullscreen mode Exit fullscreen mode

That works, but it’s noisy and still creates a temporary list for each line just to hold dt. The walrus cleans it up:

# ✅ Using walrus to parse once and reuse
filtered = [
    dt
    for line in open('server.log')
    if line.startswith('2023-09')
    and (dt := datetime.strptime(line.split()[1], "%H:%M:%S")).hour >= 9
]
Enter fullscreen mode Exit fullscreen mode

Here dt is assigned inside the condition, used immediately for the hour test, and then available for the output expression. No duplicated parsing, no junk iterables. I once shaved 15 % off a data‑cleaning script just by spotting a pattern like this and applying the walrus.

2. Feeding generators directly to built‑ins

A common pattern I see in code reviews is:

total_chars = sum([len(line) for line in open('bigfile.txt')])
Enter fullscreen mode Exit fullscreen mode

The list comprehension builds a full list of lengths in memory before sum even starts. For a modest file it’s fine, but for multi‑gigabyte inputs it’s unnecessary overhead. The generator expression does the same work lazily:

total_chars = sum(len(line) for line in open('bigfile.txt'))
Enter fullscreen mode Exit fullscreen mode

sum pulls one length at a time, adds it to the running total, and discards it—constant memory usage regardless of file size. The same principle applies to any, all, max, min, and functions from itertools like chain.from_iterable.

3. The single‑use gotcha

Because generators are lazy, they maintain internal state. Once you’ve iterated through them, that state is exhausted. This leads to bugs that look like “my filter returned nothing the second time I used it.”

# Create a generator that yields line lengths
gen = (len(line) for line in open('medium.txt'))

# First consumption works fine
print(sum(gen))          # → 123456

# Second consumption appears to be zero
print(list(gen))         # → []   (empty!)
Enter fullscreen mode Exit fullscreen mode

If you actually need to reuse the sequence, you have two options:

  • Convert to a list once (lengths = list(gen)) and then reuse that list.
  • Or, if the source is cheap to re‑iterate, simply recreate the generator: gen = (len(line) for line in open('medium.txt')).

I once spent an afternoon debugging a report generator that showed zero sales after the first page because I’d accidentally reused a generator expression for pagination. Switching to a fresh generator (or materializing the needed slice) fixed it instantly.

Why This Matters

Mastering these nuances isn’t about writing “clever” one‑liners; it’s about writing code that behaves predictably under real‑world loads.

  • Memory efficiency: Generator expressions let you process data streams that are larger than RAM without resorting to external libraries or custom iterators.
  • Performance: The walrus operator can cut costly computations in half when they appear both in a filter and an output expression.
  • Correctness: Knowing that generators are exhausted after one pass prevents subtle bugs where a pipeline works in dev (small data) but fails in production (large data, multiple passes).

When you internalize these habits, you start to spot opportunities for laziness everywhere—reading files, querying APIs, chaining transformations—and you reach for the right tool without second‑guessing. That’s the kind of intuition that separates “it works” from “it scales.”

Challenge

Take a recent piece of code where you used a list comprehension (maybe a filtering‑and‑mapping pipeline). Rewrite it as a generator expression and see if you can apply the walrus operator to avoid any duplicated work. If the original comprehension was used multiple times, think about whether you really need to materialize the list or if recreating the generator is cheaper.

Drop your before/after snippets in the comments—I’m curious to see what patterns you uncover. Happy coding!

Top comments (0)