Samuel Ochaba

Posted on Jan 6

Why sum(x**2 for x in range(1000000)) Uses 4000x Less Memory

#python #performance #tutorial #programming

Pop quiz. Which uses less memory?

# Option A
sum([x ** 2 for x in range(1000000)])

# Option B
sum(x ** 2 for x in range(1000000))

The only difference is the brackets. But Option B uses ~4000x less memory.

Let's Measure It

>>> import sys
>>> nums = list(range(100000))

>>> list_comp = [x ** 2 for x in nums]
>>> gen_exp = (x ** 2 for x in nums)

>>> sys.getsizeof(list_comp)
824456    # ~800 KB

>>> sys.getsizeof(gen_exp)
200       # ~200 bytes

The list comprehension creates 100,000 integers in memory immediately.

The generator expression? It's just a recipe. It computes values on demand.

How Generator Expressions Work

>>> squares_list = [x ** 2 for x in range(1000000)]  # 1M items NOW
>>> squares_gen = (x ** 2 for x in range(1000000))   # Nothing yet

>>> squares_gen
<generator object <genexpr> at 0x...>

The generator object is tiny because it doesn't store values—it produces them when asked.

The Catch: Single Use

>>> gen = (x for x in range(3))
>>> list(gen)
[0, 1, 2]
>>> list(gen)    # Try again
[]               # Empty! Generator exhausted.

Generators can only be iterated once. After that, they're spent.

When to Use Each

Generator Expression	List Comprehension
Large data, single pass	Need random access
Feeding into `sum()`, `max()`, etc.	Need to iterate multiple times
Memory is a concern	Need `len()` or indexing

Pro Tip: Skip the Extra Parentheses

When a generator is the only argument to a function, you don't need double parentheses:

# These are equivalent
sum((x ** 2 for x in range(10)))
sum(x ** 2 for x in range(10))   # Cleaner

# Same with other functions
max(len(word) for word in words)
any(x > 5 for x in numbers)
all(x > 0 for x in results)

The Takeaway

Those brackets aren't just syntax—they're a memory decision.

[...] = "Build everything now"
(...) = "I'll figure it out as I go"

When you're processing large data and only need one pass, drop the brackets. Your RAM will thank you.

This is adapted from my upcoming book, Zero to AI Engineer: Python Foundations.
I share excerpts like this on Substack → https://substack.com/@samuelochaba

Top comments (2)

Mr. 0x1 • Jan 6

Great article! I took your advice to "measure it" and ran a script using tracemalloc. The difference is huge—I saw ~38MB for the list comprehension vs basically 0MB for the generator.

Just for fun, if you want to take efficiency to the extreme for this specific problem (sum of squares), you can actually do it in O(1) time and memory using the closed-form formula: n * (n - 1) * (2 * n - 1) // 6.

I benchmarked that against the generator and it's another ~40,000x speedup! But clearly, for generic data processing where math shortcuts don't exist, the generator is the way to go.

Samuel Ochaba • Jan 6

Thanks for actually measuring it!. Love that you validated with tracemalloc—that ~38MB vs 0MB comparison really drives the point home.

And yes! The closed-form formula n * (n - 1) * (2n - 1) // 6 is the ultimate flex for this specific problem. Math always wins when it applies.

You've basically illustrated the performance hierarchy:

O(1) math trick → When the problem has a formula
Generator → When you need to process data once
List → When you need random access or multiple passes
Great addition to the discussion—appreciate you sharing the benchmark!