DEV Community

Iman
Iman

Posted on

Stop Using .iterrows(). Here's What Actually Fast Looks Like

You're looping over a DataFrame. It feels natural. It's killing your performance.

# What most tutorials say
for index, row in df.iterrows():
    df.at[index, 'tax'] = row['price'] * 0.17
Enter fullscreen mode Exit fullscreen mode

Here's the progression you should actually know:

# Level 1: Vectorization — 10-100x faster 
df['tax'] = df['price'] * 0.17

# Level 2: .apply() when logic is conditional 
df['tax'] = df['price'].apply(lambda x: x * 0.17 if x > 0 else 0)

# Level 3: np.where — the fastest option 
import numpy as np
df['tax'] = np.where(df['price'] > 0, df['price'] * 0.17, 0)
Enter fullscreen mode Exit fullscreen mode
Method 1M rows
.iterrows() ~480s
.apply() ~3s
Vectorized / np.where ~0.04s

Pandas wraps NumPy. NumPy operates on entire arrays at the C level. The moment you loop row by row, you throw that away.

The shift: don't think "what do I do to each row?" rather you should ask "what transformation applies to this column?"

That's it. Notebooks that took minutes will now run in seconds.


Top comments (0)