DEV Community

Samuel Ochaba
Samuel Ochaba

Posted on

Why sum(x**2 for x in range(1000000)) Uses 4000x Less Memory

Pop quiz. Which uses less memory?

# Option A
sum([x ** 2 for x in range(1000000)])

# Option B
sum(x ** 2 for x in range(1000000))
Enter fullscreen mode Exit fullscreen mode

The only difference is the brackets. But Option B uses ~4000x less memory.

Let's Measure It

>>> import sys
>>> nums = list(range(100000))

>>> list_comp = [x ** 2 for x in nums]
>>> gen_exp = (x ** 2 for x in nums)

>>> sys.getsizeof(list_comp)
824456    # ~800 KB

>>> sys.getsizeof(gen_exp)
200       # ~200 bytes
Enter fullscreen mode Exit fullscreen mode

The list comprehension creates 100,000 integers in memory immediately.

The generator expression? It's just a recipe. It computes values on demand.

How Generator Expressions Work

>>> squares_list = [x ** 2 for x in range(1000000)]  # 1M items NOW
>>> squares_gen = (x ** 2 for x in range(1000000))   # Nothing yet

>>> squares_gen
<generator object <genexpr> at 0x...>
Enter fullscreen mode Exit fullscreen mode

The generator object is tiny because it doesn't store values—it produces them when asked.

The Catch: Single Use

>>> gen = (x for x in range(3))
>>> list(gen)
[0, 1, 2]
>>> list(gen)    # Try again
[]               # Empty! Generator exhausted.
Enter fullscreen mode Exit fullscreen mode

Generators can only be iterated once. After that, they're spent.

When to Use Each

Generator Expression List Comprehension
Large data, single pass Need random access
Feeding into sum(), max(), etc. Need to iterate multiple times
Memory is a concern Need len() or indexing

Pro Tip: Skip the Extra Parentheses

When a generator is the only argument to a function, you don't need double parentheses:

# These are equivalent
sum((x ** 2 for x in range(10)))
sum(x ** 2 for x in range(10))   # Cleaner

# Same with other functions
max(len(word) for word in words)
any(x > 5 for x in numbers)
all(x > 0 for x in results)
Enter fullscreen mode Exit fullscreen mode

The Takeaway

Those brackets aren't just syntax—they're a memory decision.

  • [...] = "Build everything now"
  • (...) = "I'll figure it out as I go"

When you're processing large data and only need one pass, drop the brackets. Your RAM will thank you.


This is adapted from my upcoming book, Zero to AI Engineer: Python Foundations.
I share excerpts like this on Substack → https://substack.com/@samuelochaba

Top comments (2)

Collapse
 
copyleftdev profile image
Mr. 0x1

Great article! I took your advice to "measure it" and ran a script using tracemalloc. The difference is huge—I saw ~38MB for the list comprehension vs basically 0MB for the generator.

Just for fun, if you want to take efficiency to the extreme for this specific problem (sum of squares), you can actually do it in O(1) time and memory using the closed-form formula: n * (n - 1) * (2 * n - 1) // 6.

I benchmarked that against the generator and it's another ~40,000x speedup! But clearly, for generic data processing where math shortcuts don't exist, the generator is the way to go.

Collapse
 
samuel_ochaba_eb9c875fa89 profile image
Samuel Ochaba

Thanks for actually measuring it!. Love that you validated with tracemalloc—that ~38MB vs 0MB comparison really drives the point home.

And yes! The closed-form formula n * (n - 1) * (2n - 1) // 6 is the ultimate flex for this specific problem. Math always wins when it applies.

You've basically illustrated the performance hierarchy:

O(1) math trick → When the problem has a formula
Generator → When you need to process data once
List → When you need random access or multiple passes
Great addition to the discussion—appreciate you sharing the benchmark!