Rahul Singh

Posted on Apr 4 • Originally published at aicodereview.cc

Python List Comprehension: The Complete Guide (2026)

#codereview #ai #programming #tutorial

What are list comprehensions?

List comprehensions are one of Python's most distinctive features -- a concise, readable syntax for creating lists by transforming and filtering elements from existing iterables. They replace multi-line for loop patterns with a single expression that is both easier to read and faster to execute.

The basic syntax looks like this:

new_list = [expression for item in iterable]

This is equivalent to the following for loop:

new_list = []
for item in iterable:
    new_list.append(expression)

The difference is not just cosmetic. List comprehensions are optimized at the bytecode level. Python's compiler recognizes the pattern and uses a specialized LIST_APPEND opcode instead of the repeated attribute lookup and method call that list.append() requires. This is why list comprehensions are consistently faster than their for loop equivalents, as we will demonstrate with benchmarks later in this guide.

A brief history

List comprehensions were introduced in Python 2.0 through PEP 202, which was accepted in 2000. The syntax was directly inspired by set-builder notation in mathematics and similar constructs in functional languages like Haskell. The PEP's motivation was simple: Python programmers were already using map() and filter() with lambda to achieve these transformations, but the resulting code was often harder to read than the equivalent loop. List comprehensions provided a middle ground -- more concise than a loop, more readable than nested map/filter/lambda calls.

Since Python 3.0, list comprehensions have their own scope. Variables defined inside a comprehension do not leak into the enclosing scope, which eliminated a common source of subtle bugs that existed in Python 2.

# Python 3: x does not leak
squares = [x**2 for x in range(5)]
# print(x)  # NameError: name 'x' is not defined

# This was a real problem in Python 2 where x would be 4 after this line

Basic syntax and examples

The core pattern of a list comprehension has three parts: an expression that produces each element, a variable that iterates over the source, and an iterable that provides the input data.

Simple transformations

The most common use case is applying a transformation to every element in a sequence.

# Double every number
numbers = [1, 2, 3, 4, 5]
doubled = [n * 2 for n in numbers]
# [2, 4, 6, 8, 10]

# Square every number
squares = [n ** 2 for n in range(1, 11)]
# [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

# Convert temperatures from Celsius to Fahrenheit
celsius = [0, 10, 20, 30, 40]
fahrenheit = [(c * 9/5) + 32 for c in celsius]
# [32.0, 50.0, 68.0, 86.0, 104.0]

String operations

List comprehensions work naturally with string methods. This is where they often produce the most readable code compared to map().

words = ["hello", "world", "python", "list"]

# Capitalize every word
upper_words = [w.upper() for w in words]
# ['HELLO', 'WORLD', 'PYTHON', 'LIST']

# Get the length of each word
lengths = [len(w) for w in words]
# [5, 5, 6, 4]

# Strip whitespace from user input
raw_inputs = ["  alice ", "bob  ", " charlie "]
cleaned = [s.strip() for s in raw_inputs]
# ['alice', 'bob', 'charlie']

# Extract first character from each word
initials = [w[0] for w in words]
# ['h', 'w', 'p', 'l']

Type conversions

Converting between types is a frequent pattern, especially when reading data from files or external sources where everything arrives as strings.

# Convert strings to integers
str_numbers = ["1", "2", "3", "4", "5"]
int_numbers = [int(s) for s in str_numbers]
# [1, 2, 3, 4, 5]

# Convert strings to floats
prices_raw = ["19.99", "5.50", "12.00", "3.75"]
prices = [float(p) for p in prices_raw]
# [19.99, 5.5, 12.0, 3.75]

# Convert a list of tuples to a list of dictionaries
pairs = [("name", "Alice"), ("age", "30"), ("city", "NYC")]
record = {k: v for k, v in pairs}
# {'name': 'Alice', 'age': '30', 'city': 'NYC'}

Calling functions

The expression in a list comprehension can be any valid Python expression, including function calls.


# Get absolute paths for all files in a directory listing
filenames = ["data.csv", "config.yaml", "readme.md"]
full_paths = [os.path.abspath(f) for f in filenames]

# Apply a custom function
def normalize(value, min_val, max_val):
    return (value - min_val) / (max_val - min_val)

raw_scores = [45, 67, 89, 23, 91]
normalized = [normalize(s, min(raw_scores), max(raw_scores)) for s in raw_scores]
# [0.3235..., 0.6470..., 0.9705..., 0.0, 1.0]

Conditional filtering

List comprehensions become truly powerful when you add filtering conditions. There are two distinct patterns here, and they work differently.

Filtering with `if`

An if clause at the end of a comprehension filters which elements get included. Only items where the condition is True produce output.

numbers = range(-10, 11)

# Keep only positive numbers
positives = [n for n in numbers if n > 0]
# [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Keep only even numbers
evens = [n for n in numbers if n % 2 == 0]
# [-10, -8, -6, -4, -2, 0, 2, 4, 6, 8, 10]

# Filter strings by length
words = ["I", "am", "learning", "Python", "list", "comprehensions"]
long_words = [w for w in words if len(w) > 4]
# ['learning', 'Python', 'comprehensions']

# Filter out None values
data = [1, None, 3, None, 5, None, 7]
clean_data = [x for x in data if x is not None]
# [1, 3, 5, 7]

Conditional expression with `if`/`else` (ternary)

When you want to transform every element but apply different transformations based on a condition, you use a conditional expression (ternary operator) in the expression part of the comprehension. Note the different position -- it goes before the for, not after it.

numbers = [-3, -2, -1, 0, 1, 2, 3]

# Replace negatives with zero
clamped = [n if n > 0 else 0 for n in numbers]
# [0, 0, 0, 0, 1, 2, 3]

# Label numbers as even or odd
labels = ["even" if n % 2 == 0 else "odd" for n in range(1, 6)]
# ['odd', 'even', 'odd', 'even', 'odd']

# Apply different discounts based on membership
prices = [100, 200, 50, 300]
is_member = True
discounted = [p * 0.8 if is_member else p * 0.95 for p in prices]
# [80.0, 160.0, 40.0, 240.0]

The key difference: [x for x in items if condition] filters (output may be shorter than input). [a if condition else b for x in items] transforms (output is always the same length as input).

Multiple conditions

You can chain multiple if clauses, which acts as a logical AND.

numbers = range(1, 51)

# Numbers divisible by both 3 and 5
fizzbuzz = [n for n in numbers if n % 3 == 0 if n % 5 == 0]
# [15, 30, 45]

# This is equivalent to using 'and'
fizzbuzz_alt = [n for n in numbers if n % 3 == 0 and n % 5 == 0]
# [15, 30, 45]

# Combining filter and transform
# Keep positive even numbers, doubled
result = [n * 2 for n in range(-10, 11) if n > 0 if n % 2 == 0]
# [4, 8, 12, 16, 20]

You can also combine filtering with conditional expressions for more complex logic.

# FizzBuzz in a single comprehension
fizzbuzz = [
    "FizzBuzz" if n % 15 == 0
    else "Fizz" if n % 3 == 0
    else "Buzz" if n % 5 == 0
    else str(n)
    for n in range(1, 16)
]
# ['1', '2', 'Fizz', '4', 'Buzz', 'Fizz', '7', '8', 'Fizz', 'Buzz',
#  '11', 'Fizz', '13', '14', 'FizzBuzz']

That FizzBuzz example is about as complex as a comprehension should get. Beyond this point, a traditional loop is clearer.

Nested list comprehensions

Nested list comprehensions involve multiple for clauses within a single comprehension. The reading order matches the order you would write nested for loops -- the outermost loop comes first.

Flattening nested lists

The most common use of nested comprehensions is flattening a list of lists into a single list.

matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# Flatten the matrix
flat = [cell for row in matrix for cell in row]
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

The equivalent nested loop makes the reading order clear:

flat = []
for row in matrix:        # outer loop -> first 'for' in comprehension
    for cell in row:      # inner loop -> second 'for' in comprehension
        flat.append(cell)

Matrix operations

Nested comprehensions are useful for creating and transforming matrices.

# Create a 5x5 identity matrix
identity = [[1 if i == j else 0 for j in range(5)] for i in range(5)]
# [[1, 0, 0, 0, 0],
#  [0, 1, 0, 0, 0],
#  [0, 0, 1, 0, 0],
#  [0, 0, 0, 1, 0],
#  [0, 0, 0, 0, 1]]

# Transpose a matrix
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
transposed = [[row[i] for row in matrix] for i in range(len(matrix[0]))]
# [[1, 4, 7], [2, 5, 8], [3, 6, 9]]

# Multiply every element by 2
doubled_matrix = [[cell * 2 for cell in row] for row in matrix]
# [[2, 4, 6], [8, 10, 12], [14, 16, 18]]

Note the difference between these two patterns. [cell for row in matrix for cell in row] produces a flat list (single comprehension with two for clauses). [[cell * 2 for cell in row] for row in matrix] produces a nested list (a comprehension inside another comprehension).

Filtering in nested comprehensions

You can add if clauses to any of the for clauses in a nested comprehension.

matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# Flatten but keep only even numbers
even_flat = [cell for row in matrix for cell in row if cell % 2 == 0]
# [2, 4, 6, 8]

# Flatten but skip the first row
partial_flat = [cell for i, row in enumerate(matrix) if i > 0 for cell in row]
# [4, 5, 6, 7, 8, 9]

When nesting gets too deep

Two levels of nesting is the practical limit for readability. Once you go to three or more levels, even experienced Python developers have to stop and mentally trace the execution order.

# This is too complex for a comprehension - don't do this
result = [
    val
    for group in data
    for subgroup in group
    for item in subgroup
    if item.active
    for val in item.values
    if val > threshold
]

# Use a regular loop instead
result = []
for group in data:
    for subgroup in group:
        for item in subgroup:
            if item.active:
                for val in item.values:
                    if val > threshold:
                        result.append(val)

The loop version is longer but immediately readable. The comprehension version requires mental gymnastics to follow. When you find yourself writing a comprehension that spans more than three lines or has more than two for clauses, extract the logic into a helper function or use a regular loop.

Dictionary and set comprehensions

Python extends the comprehension syntax to dictionaries and sets. The syntax mirrors list comprehensions with minor differences in the delimiters and expression format.

Dictionary comprehensions

Dictionary comprehensions use curly braces and a key: value expression.

# Create a dictionary from two lists
names = ["Alice", "Bob", "Charlie"]
ages = [30, 25, 35]
name_to_age = {name: age for name, age in zip(names, ages)}
# {'Alice': 30, 'Bob': 25, 'Charlie': 35}

# Invert a dictionary (swap keys and values)
original = {"a": 1, "b": 2, "c": 3}
inverted = {v: k for k, v in original.items()}
# {1: 'a', 2: 'b', 3: 'c'}

# Filter a dictionary
scores = {"Alice": 92, "Bob": 67, "Charlie": 85, "Diana": 45}
passing = {name: score for name, score in scores.items() if score >= 70}
# {'Alice': 92, 'Charlie': 85}

# Transform values
prices_usd = {"apple": 1.20, "banana": 0.50, "cherry": 3.00}
prices_eur = {item: round(price * 0.92, 2) for item, price in prices_usd.items()}
# {'apple': 1.10, 'banana': 0.46, 'cherry': 2.76}

A practical pattern that comes up frequently is building lookup dictionaries from lists of objects.

# Build an index from a list of records
users = [
    {"id": 1, "name": "Alice", "email": "alice@example.com"},
    {"id": 2, "name": "Bob", "email": "bob@example.com"},
    {"id": 3, "name": "Charlie", "email": "charlie@example.com"},
]
user_by_id = {u["id"]: u for u in users}
# {1: {'id': 1, 'name': 'Alice', ...}, 2: {...}, 3: {...}}

# Now O(1) lookup instead of O(n) linear search
alice = user_by_id[1]

Set comprehensions

Set comprehensions use curly braces with a single expression (no colon), producing a set with unique values.

# Get unique word lengths from a text
text = "the quick brown fox jumps over the lazy dog"
unique_lengths = {len(word) for word in text.split()}
# {3, 4, 5}

# Extract unique file extensions
files = ["data.csv", "config.yaml", "readme.md", "backup.csv", "notes.md"]
extensions = {f.split(".")[-1] for f in files}
# {'csv', 'yaml', 'md'}

# Find characters that appear in a string (lowercase)
chars = {c.lower() for c in "Hello World" if c.isalpha()}
# {'h', 'e', 'l', 'o', 'w', 'r', 'd'}

Set comprehensions are especially useful when you need to deduplicate results or build sets for intersection/union operations.

# Find common tags between two article lists
article_a_tags = ["python", "tutorial", "beginner", "coding"]
article_b_tags = ["python", "advanced", "coding", "performance"]

tags_a = {t for t in article_a_tags}
tags_b = {t for t in article_b_tags}
common = tags_a & tags_b
# {'python', 'coding'}

Generator expressions vs list comprehensions

Generator expressions look almost identical to list comprehensions -- the only syntactic difference is using parentheses instead of square brackets. But the behavior is fundamentally different.

# List comprehension - creates the entire list in memory
list_comp = [x ** 2 for x in range(1_000_000)]

# Generator expression - creates a lazy iterator
gen_expr = (x ** 2 for x in range(1_000_000))

The list comprehension immediately allocates memory for all one million results. The generator expression creates a lightweight iterator object that computes each value on demand when you iterate over it.

Memory comparison

The memory difference is dramatic for large datasets.


# List comprehension: stores all values
list_result = [x ** 2 for x in range(1_000_000)]
print(sys.getsizeof(list_result))
# ~8,448,728 bytes (about 8 MB)

# Generator expression: stores almost nothing
gen_result = (x ** 2 for x in range(1_000_000))
print(sys.getsizeof(gen_result))
# ~200 bytes (constant, regardless of input size)

This is not a minor optimization. If you are processing a 10 GB log file line by line and only need to compute a sum, a list comprehension would try to hold all transformed values in memory simultaneously. A generator expression would process one value at a time, using kilobytes instead of gigabytes.

When to use generators

Use a generator expression when:

You only iterate once. If you need to loop through the results a single time and never reference them again, a generator is strictly better.
The dataset is large. Any time you are working with files, database results, or API responses that could be arbitrarily large.
You are passing results to a function that consumes iterables. Functions like sum(), any(), all(), min(), max(), and "".join() work perfectly with generators.

Use a list comprehension when:

You need to index into the results (e.g., results[5]).
You need to iterate multiple times.
You need to know the length (e.g., len(results)).
The dataset is small and the memory difference is irrelevant.

Using generators with built-in functions

Generator expressions pair naturally with aggregation functions. When you pass a generator expression as the sole argument to a function, you can omit the outer parentheses for cleaner syntax.

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Sum of squares (no extra parentheses needed)
total = sum(x ** 2 for x in numbers)
# 385

# Check if any number is negative
has_negative = any(x < 0 for x in numbers)
# False

# Check if all numbers are positive
all_positive = all(x > 0 for x in numbers)
# True

# Find the longest word
words = ["python", "list", "comprehension", "guide"]
longest = max(words, key=lambda w: len(w))
# 'comprehension'

# Join transformed strings
csv_line = ",".join(str(x * 10) for x in range(5))
# '0,10,20,30,40'

Benchmark: memory usage on large datasets

Here is a concrete comparison processing a realistic data volume.


def process_with_list_comp(n):
    """Sum of squares using list comprehension."""
    tracemalloc.start()
    result = sum([x ** 2 for x in range(n)])
    current, peak = tracemalloc.get_traced_memory()
    tracemalloc.stop()
    return result, peak

def process_with_generator(n):
    """Sum of squares using generator expression."""
    tracemalloc.start()
    result = sum(x ** 2 for x in range(n))
    current, peak = tracemalloc.get_traced_memory()
    tracemalloc.stop()
    return result, peak

n = 10_000_000
_, list_peak = process_with_list_comp(n)
_, gen_peak = process_with_generator(n)

print(f"List comprehension peak memory: {list_peak / 1024 / 1024:.1f} MB")
print(f"Generator expression peak memory: {gen_peak / 1024 / 1024:.1f} MB")
# List comprehension peak memory: ~401.3 MB
# Generator expression peak memory: ~0.0 MB

That is roughly 400 MB versus effectively zero for the same computation with the same result. When you are only aggregating data, generator expressions are the correct choice.

The walrus operator in comprehensions (Python 3.8+)

Python 3.8 introduced the walrus operator (:=), formally known as assignment expressions (PEP 572). In comprehensions, it solves a specific problem: when you need to both test a computed value and include it in the output, the walrus operator lets you avoid computing it twice.

The problem it solves

Consider this scenario: you have a list of strings and want to keep only those that, when parsed as integers, exceed a threshold.

raw_data = ["10", "abc", "25", "def", "42", "7"]

# Without walrus operator - calling int() where possible, twice
def try_parse(s):
    try:
        return int(s)
    except ValueError:
        return None

# You compute the value twice: once to filter, once to keep
results = [try_parse(s) for s in raw_data if try_parse(s) is not None and try_parse(s) > 15]
# [25, 42] -- but try_parse is called up to 3 times per element

The walrus operator solution

The walrus operator assigns and tests in a single expression.

# With walrus operator - compute once, use twice
results = [parsed for s in raw_data if (parsed := try_parse(s)) is not None and parsed > 15]
# [25, 42] -- try_parse is called exactly once per element

Real-world examples

Filtering with expensive computations:


log_lines = [
    "2026-03-15 ERROR: Connection timeout on server-1",
    "2026-03-15 INFO: Health check passed",
    "2026-03-15 ERROR: Disk usage at 95% on server-3",
    "2026-03-15 DEBUG: Cache hit ratio 0.87",
    "2026-03-15 WARNING: Memory usage high on server-2",
]

# Extract server names from error lines only
pattern = re.compile(r"ERROR:.*?(server-\d+)")
error_servers = [
    match.group(1)
    for line in log_lines
    if (match := pattern.search(line))
]
# ['server-1', 'server-3']

Without the walrus operator, you would either call pattern.search(line) twice (once to check, once to extract) or fall back to a regular loop.

Chaining transformations with intermediate results:

# Process data where each step depends on the previous
data = [{"name": "Alice", "scores": [85, 92, 78]},
        {"name": "Bob", "scores": [60, 55, 70]},
        {"name": "Charlie", "scores": [95, 88, 91]}]

# Find students with average score above 80
honor_roll = [
    (student["name"], avg)
    for student in data
    if (avg := sum(student["scores"]) / len(student["scores"])) >= 80
]
# [('Alice', 85.0), ('Charlie', 91.33...)]

When to use the walrus operator in comprehensions

Use it when:

You need to compute an expensive value and both filter on it and include it in the output.
A regex match needs to be tested and then have groups extracted.
An intermediate calculation is referenced multiple times.

Avoid it when:

The comprehension is already complex. Adding := to a multi-condition comprehension makes it harder to read.
The computation is trivial. [x for x in items if (y := x * 2) > 10] does not save meaningful work compared to [x * 2 for x in items if x * 2 > 10].

Performance benchmarks

One of the most frequent questions about list comprehensions is whether they are actually faster. The answer is yes, but the magnitude depends on the operation. Here are concrete benchmarks run on Python 3.12 using timeit with 1,000 iterations for each approach.

List comprehension vs for loop vs map/filter

Test 1: Simple transformation (squaring numbers)


n = 100_000

# For loop
def for_loop():
    result = []
    for i in range(n):
        result.append(i ** 2)
    return result

# List comprehension
def list_comp():
    return [i ** 2 for i in range(n)]

# map()
def map_approach():
    return list(map(lambda i: i ** 2, range(n)))

print(f"For loop:           {timeit.timeit(for_loop, number=100):.3f}s")
print(f"List comprehension: {timeit.timeit(list_comp, number=100):.3f}s")
print(f"map() + lambda:     {timeit.timeit(map_approach, number=100):.3f}s")

# Typical results (Python 3.12, Apple M2):
# For loop:           1.482s
# List comprehension: 1.168s
# map() + lambda:     1.320s

The list comprehension is about 21% faster than the for loop and 12% faster than map() with a lambda. When map() uses a built-in C function instead of a lambda, it can match or beat list comprehension speed:

# map() with a C built-in function
def map_builtin():
    return list(map(str, range(n)))

def list_comp_str():
    return [str(i) for i in range(n)]

# map() with built-in is ~5-10% faster here because there is no
# Python-level function call overhead

Test 2: Filtering

numbers = list(range(100_000))

def for_loop_filter():
    result = []
    for n in numbers:
        if n % 3 == 0:
            result.append(n)
    return result

def list_comp_filter():
    return [n for n in numbers if n % 3 == 0]

def filter_approach():
    return list(filter(lambda n: n % 3 == 0, numbers))

print(f"For loop:           {timeit.timeit(for_loop_filter, number=100):.3f}s")
print(f"List comprehension: {timeit.timeit(list_comp_filter, number=100):.3f}s")
print(f"filter() + lambda:  {timeit.timeit(filter_approach, number=100):.3f}s")

# Typical results:
# For loop:           0.698s
# List comprehension: 0.471s
# filter() + lambda:  0.643s

For filtering, list comprehensions are about 33% faster than for loops and 27% faster than filter() with a lambda.

Bytecode comparison

The performance difference comes from how Python compiles each approach. You can inspect this with the dis module.


# For loop bytecode
def for_loop_example():
    result = []
    for x in range(10):
        result.append(x * 2)
    return result

# List comprehension bytecode
def list_comp_example():
    return [x * 2 for x in range(10)]

print("=== For loop ===")
dis.dis(for_loop_example)
print("\n=== List comprehension ===")
dis.dis(list_comp_example)

The key difference in the bytecode output: the for loop generates LOAD_ATTR (to look up append), CALL_FUNCTION (to call it), and POP_TOP (to discard the return value) on every iteration. The list comprehension generates a single LIST_APPEND instruction that directly appends to the list being built, skipping the attribute lookup and function call overhead entirely.

Here is a simplified view of the critical inner loop bytecode:

For loop (per iteration):
  LOAD_FAST     result
  LOAD_ATTR     append      # attribute lookup every iteration
  LOAD_FAST     x
  LOAD_CONST    2
  BINARY_OP     MULTIPLY
  CALL_FUNCTION 1           # function call overhead
  POP_TOP                   # discard append()'s None return

List comprehension (per iteration):
  LOAD_FAST     x
  LOAD_CONST    2
  BINARY_OP     MULTIPLY
  LIST_APPEND   2           # direct C-level append, no lookup

The comprehension saves three bytecode instructions per iteration. Over millions of iterations, this adds up.

When the performance advantage disappears

The comprehension speed advantage is consistent for simple operations but diminishes as the per-element work increases. If each iteration involves a database query, an API call, or any I/O-bound operation, the overhead of append() lookups is negligible compared to the actual work being done.


def slow_transform(x):
    """Simulate an expensive operation."""
    time.sleep(0.0001)  # 0.1ms per element
    return x * 2

# When the per-element work is expensive, the difference vanishes
data = list(range(1000))

# Both take ~0.1 seconds, the append overhead is irrelevant
result_loop = []
for x in data:
    result_loop.append(slow_transform(x))

result_comp = [slow_transform(x) for x in data]

Rule of thumb: optimize for readability first. If profiling shows a hot loop where comprehension speed matters, switch to a comprehension. But do not sacrifice readability for a 20% speedup in code that runs once during startup.

Real-world data processing examples

List comprehensions shine in data processing pipelines where you transform, filter, and restructure data from external sources. Here are patterns you will encounter in production code.

CSV data filtering and transformation


from io import StringIO

csv_data = """name,age,city,salary
Alice,32,New York,95000
Bob,28,San Francisco,88000
Charlie,45,Chicago,120000
Diana,38,New York,105000
Eve,24,San Francisco,72000
Frank,51,Chicago,135000"""

reader = csv.DictReader(StringIO(csv_data))
rows = list(reader)

# Filter employees in New York with salary > 100k
ny_high_earners = [
    {"name": row["name"], "salary": int(row["salary"])}
    for row in rows
    if row["city"] == "New York" and int(row["salary"]) > 100000
]
# [{'name': 'Diana', 'salary': 105000}]

# Calculate average salary per city
cities = {row["city"] for row in rows}
avg_by_city = {
    city: sum(int(r["salary"]) for r in rows if r["city"] == city)
         / sum(1 for r in rows if r["city"] == city)
    for city in cities
}
# {'New York': 100000.0, 'San Francisco': 80000.0, 'Chicago': 127500.0}

# Create a summary with age brackets
def age_bracket(age):
    if age < 30: return "20s"
    if age < 40: return "30s"
    if age < 50: return "40s"
    return "50+"

summary = [
    {**row, "bracket": age_bracket(int(row["age"]))}
    for row in rows
]

JSON API response processing


# Simulated API response
api_response = {
    "status": "success",
    "data": {
        "users": [
            {"id": 1, "name": "Alice", "role": "admin", "active": True, "last_login": "2026-03-14"},
            {"id": 2, "name": "Bob", "role": "user", "active": False, "last_login": "2026-01-10"},
            {"id": 3, "name": "Charlie", "role": "user", "active": True, "last_login": "2026-03-15"},
            {"id": 4, "name": "Diana", "role": "moderator", "active": True, "last_login": "2026-03-13"},
            {"id": 5, "name": "Eve", "role": "user", "active": True, "last_login": "2026-02-28"},
        ]
    }
}

users = api_response["data"]["users"]

# Extract active user emails for a notification system
active_user_ids = [u["id"] for u in users if u["active"]]
# [1, 3, 4, 5]

# Build a role-based access control lookup
role_to_users = {
    role: [u["name"] for u in users if u["role"] == role]
    for role in {u["role"] for u in users}
}
# {'admin': ['Alice'], 'user': ['Bob', 'Charlie', 'Eve'], 'moderator': ['Diana']}

# Transform API response into a different schema
transformed = [
    {
        "user_id": u["id"],
        "display_name": u["name"].upper(),
        "is_admin": u["role"] == "admin",
        "status": "active" if u["active"] else "inactive",
    }
    for u in users
]

Log file parsing


from collections import Counter

log_lines = [
    "2026-03-15 08:23:01 INFO  [web-server] Request GET /api/users 200 45ms",
    "2026-03-15 08:23:02 ERROR [web-server] Request POST /api/orders 500 1203ms",
    "2026-03-15 08:23:02 INFO  [web-server] Request GET /api/products 200 23ms",
    "2026-03-15 08:23:03 WARN  [auth-service] Token expired for user_id=4821",
    "2026-03-15 08:23:04 ERROR [db-pool] Connection timeout after 30000ms",
    "2026-03-15 08:23:05 INFO  [web-server] Request GET /api/users 200 38ms",
    "2026-03-15 08:23:05 ERROR [web-server] Request GET /api/reports 503 5023ms",
]

# Extract all error messages
errors = [line for line in log_lines if " ERROR " in line]

# Parse request logs into structured data
request_pattern = re.compile(
    r"(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \w+\s+\[.*?\] "
    r"Request (\w+) (\S+) (\d+) (\d+)ms"
)

requests = [
    {
        "timestamp": match.group(1),
        "method": match.group(2),
        "path": match.group(3),
        "status": int(match.group(4)),
        "duration_ms": int(match.group(5)),
    }
    for line in log_lines
    if (match := request_pattern.search(line))
]

# Find slow requests (over 1 second)
slow_requests = [r for r in requests if r["duration_ms"] > 1000]
# [{'method': 'POST', 'path': '/api/orders', 'status': 500, 'duration_ms': 1203}, ...]

# Count requests per endpoint
endpoint_counts = Counter(r["path"] for r in requests)
# Counter({'/api/users': 2, '/api/orders': 1, '/api/products': 1, '/api/reports': 1})

File system operations


from pathlib import Path

project_dir = Path("/path/to/project")

# Find all Python files (non-recursive)
py_files = [f for f in project_dir.iterdir() if f.suffix == ".py"]

# Find all Python files recursively, excluding __pycache__ and .venv
py_files_recursive = [
    f for f in project_dir.rglob("*.py")
    if "__pycache__" not in f.parts and ".venv" not in f.parts
]

# Get file sizes for all files in a directory
file_sizes = {
    f.name: f.stat().st_size
    for f in project_dir.iterdir()
    if f.is_file()
}

# Find files modified in the last 24 hours

one_day_ago = time.time() - 86400
recent_files = [
    f for f in project_dir.rglob("*")
    if f.is_file() and f.stat().st_mtime > one_day_ago
]

# Group files by extension
from collections import defaultdict
extensions = {f.suffix for f in project_dir.rglob("*") if f.is_file() and f.suffix}
files_by_ext = {
    ext: [f.name for f in project_dir.rglob(f"*{ext}")]
    for ext in extensions
}

Common anti-patterns

List comprehensions are powerful, but they are frequently misused. Here are the patterns that experienced Python developers recognize as code smells.

Anti-pattern 1: comprehensions that are too long

If a comprehension does not fit on a single line (roughly 79-88 characters, depending on your team's style), it is a signal that it might be doing too much.

# Bad: too much logic crammed into one line
result = [transform(item.value) for item in collection if item.is_valid() and item.category in allowed_categories and item.date >= start_date]

# Better: break across lines
result = [
    transform(item.value)
    for item in collection
    if item.is_valid()
    and item.category in allowed_categories
    and item.date >= start_date
]

# Best (if logic is complex): use a helper function
def should_include(item):
    return (
        item.is_valid()
        and item.category in allowed_categories
        and item.date >= start_date
    )

result = [transform(item.value) for item in collection if should_include(item)]

Anti-pattern 2: side effects in comprehensions

Comprehensions should produce a value. Using them for side effects is confusing and wasteful because the resulting list is created and immediately discarded.

# Bad: using a comprehension for side effects
[print(item) for item in items]  # creates a list of None values and throws it away
[send_email(user) for user in users]  # same problem
[db.insert(record) for record in records]  # even worse: hides I/O operations

# Good: use a for loop for side effects
for item in items:
    print(item)

for user in users:
    send_email(user)

for record in records:
    db.insert(record)

The comprehension version is not just bad style -- it wastes memory by building a list of None return values that nobody uses. More importantly, it hides the intent. A reader seeing a list comprehension expects the resulting list to be used. When it is not, they have to stop and figure out what the code is actually doing.

Anti-pattern 3: nesting more than two levels deep

# Bad: three levels of nesting
result = [
    word.lower()
    for document in corpus
    for paragraph in document.paragraphs
    for sentence in paragraph.sentences
    for word in sentence.words
    if word.is_alpha()
]

# Better: extract into a generator function
def extract_words(corpus):
    for document in corpus:
        for paragraph in document.paragraphs:
            for sentence in paragraph.sentences:
                for word in sentence.words:
                    if word.is_alpha():
                        yield word.lower()

result = list(extract_words(corpus))

The generator function is more lines of code, but each line is trivially understandable. The deeply nested comprehension requires holding the entire structure in your head simultaneously.

Anti-pattern 4: building huge lists that should be generators

# Bad: materializing a huge list just to iterate once
total = sum([x ** 2 for x in range(10_000_000)])  # ~400 MB wasted

# Good: use a generator expression
total = sum(x ** 2 for x in range(10_000_000))  # ~0 MB extra

# Bad: building a list just to check a condition
if len([x for x in items if x.is_valid()]) > 0:
    process(items)

# Good: use any() with a generator
if any(x.is_valid() for x in items):
    process(items)
# Bonus: any() short-circuits -- it stops at the first True

Anti-pattern 5: unnecessarily complex conditional expressions

# Bad: deeply nested ternary operators
result = [
    "high" if score > 90 else "medium" if score > 70 else "low" if score > 50 else "fail"
    for score in scores
]

# Better: use a helper function
def grade(score):
    if score > 90: return "high"
    if score > 70: return "medium"
    if score > 50: return "low"
    return "fail"

result = [grade(score) for score in scores]

The helper function version is self-documenting. The name grade tells you what the transformation does, and the if/return chains are immediately clear.

Best practices

When to use a list comprehension vs a for loop

Use a list comprehension when:

You are building a new list from an existing iterable.
The transformation and/or filter logic is simple (fits on one or two lines).
The resulting list will actually be used.
The logic does not require try/except error handling.

Use a for loop when:

You need side effects (printing, writing, sending data).
The logic involves try/except blocks.
The loop body requires multiple statements.
Readability suffers with the comprehension form.
You need to break or continue based on runtime conditions.
You are accumulating into something other than a list (use reduce or a loop for complex accumulations).

# Comprehension is perfect here
squares = [x ** 2 for x in range(20) if x % 2 == 0]

# For loop is better here (try/except, side effects)
valid_records = []
for line in raw_lines:
    try:
        record = json.loads(line)
        if record.get("status") == "active":
            valid_records.append(record)
            logger.info(f"Processed record {record['id']}")
    except json.JSONDecodeError:
        logger.warning(f"Skipping malformed line: {line[:50]}")

Readability guidelines

The Python community generally follows these readability conventions:

If the comprehension fits on one line and is immediately clear, keep it on one line.

names = [user.name for user in users]

If it needs a condition, it can go on one line if it is still clear.

active_names = [user.name for user in users if user.is_active]

If it exceeds ~80 characters, break it across lines with one clause per line.

results = [
    transform(item)
    for item in collection
    if item.is_valid()
]

If the expression part is complex, extract a function.

# Instead of this
data = [
    {"name": u.name, "email": u.email, "role": u.role.name, "active": u.is_active}
    for u in users
    if u.department == target_dept
]

# Do this
def user_summary(user):
    return {
        "name": user.name,
        "email": user.email,
        "role": user.role.name,
        "active": user.is_active,
    }

data = [user_summary(u) for u in users if u.department == target_dept]

PEP 8 style recommendations

PEP 8 does not have extensive specific rules about comprehensions, but the general style guidelines apply:

Keep lines under 79 characters (or whatever your team's limit is -- 88 and 120 are common).
Use a consistent style for multi-line comprehensions. The most readable format puts each clause on its own line, indented by four spaces from the opening bracket.
Do not use backslash line continuations in comprehensions. The brackets provide implicit continuation.

# Good: implicit line continuation inside brackets
result = [
    process(item)
    for item in very_long_collection_name
    if item.meets_criteria()
    and item.is_not_excluded()
]

# Bad: backslash continuation
result = [process(item) for item in \
    very_long_collection_name if \
    item.meets_criteria()]

Breaking long comprehensions across lines

When a comprehension needs multiple lines, there is a standard pattern that the Python community has settled on:

# Pattern 1: Simple multi-line
result = [
    expression
    for variable in iterable
    if condition
]

# Pattern 2: Nested for clauses
result = [
    expression
    for outer_var in outer_iterable
    for inner_var in inner_iterable
    if condition
]

# Pattern 3: Complex expression
result = [
    {
        "key1": value1,
        "key2": value2,
        "key3": value3,
    }
    for item in collection
    if item.is_valid()
]

# Pattern 4: Dict comprehension, multi-line
mapping = {
    key_expression: value_expression
    for item in collection
    if condition
}

A final note on readability over cleverness

The most important rule for list comprehensions is the same as for all Python code: optimize for the reader, not the writer. A comprehension that saves you thirty seconds to write but costs every future reader ten seconds to understand is a net loss. The goal is not to put as much logic as possible into a single expression. The goal is to make intent clear at a glance.

When in doubt, write the comprehension, then read it back. If you have to think about what it does for more than a moment, refactor it. Extract a function, use a loop, add a comment -- whatever makes the code reveal its intent without requiring the reader to be a comprehension expert.

# This is clever but requires mental overhead to parse
valid = [y for x in raw if (y := parse(x)) and y.status in ALLOWED and y.score > cutoff]

# This is longer but immediately clear
def is_acceptable(record):
    return record is not None and record.status in ALLOWED and record.score > cutoff

valid = [parsed for x in raw if is_acceptable(parsed := parse(x))]

# Or just use a loop if the logic is complex enough
valid = []
for x in raw:
    parsed = parse(x)
    if parsed and parsed.status in ALLOWED and parsed.score > cutoff:
        valid.append(parsed)

All three versions produce the same result. The right choice depends on your team's familiarity with walrus operators, the complexity of the surrounding code, and whether the comprehension pattern appears frequently enough in your codebase that readers will recognize it instantly.

List comprehensions are one of Python's best features when used well. They make simple transformations and filters visually obvious, they run faster than the equivalent loops, and they reduce the boilerplate that obscures intent. The key is knowing when to reach for them and when to reach for something else.

Frequently Asked Questions

Are list comprehensions faster than for loops in Python?

Yes, list comprehensions are typically 10-30% faster than equivalent for loops because they are optimized at the bytecode level. The interpreter uses a specialized LIST_APPEND operation instead of repeated list.append() method lookups. However, for very complex logic, the difference becomes negligible and readability should take priority.

When should I NOT use a list comprehension?

Avoid list comprehensions when: the logic requires more than one condition or transformation (hard to read), you need side effects (like printing or writing to a file), the resulting list would be too large for memory (use a generator expression instead), or you need try/except error handling inside the loop.

What is the difference between a list comprehension and a generator expression?

A list comprehension [x for x in range(n)] creates the entire list in memory. A generator expression (x for x in range(n)) produces values lazily, one at a time, using minimal memory. Use generators when you only need to iterate once or when the dataset is large.

Can I use the walrus operator in list comprehensions?

Yes, Python 3.8+ supports the walrus operator (:=) in comprehensions. It lets you assign and test a value in one step: [y for x in data if (y := expensive_fn(x)) > threshold]. This avoids calling the function twice — once for filtering and once for the result.

How do nested list comprehensions work?

Nested list comprehensions follow the same order as nested for loops. [cell for row in matrix for cell in row] is equivalent to a for row in matrix loop with a for cell in row inner loop. The outermost loop comes first in the comprehension.

Originally published at aicodereview.cc