Stop Using .iterrows(). Here's What Actually Fast Looks Like

#python #pandas #datascience

You're looping over a DataFrame. It feels natural. It's killing your performance.

# What most tutorials say
for index, row in df.iterrows():
    df.at[index, 'tax'] = row['price'] * 0.17

Here's the progression you should actually know:

# Level 1: Vectorization — 10-100x faster 
df['tax'] = df['price'] * 0.17

# Level 2: .apply() when logic is conditional 
df['tax'] = df['price'].apply(lambda x: x * 0.17 if x > 0 else 0)

# Level 3: np.where — the fastest option 
import numpy as np
df['tax'] = np.where(df['price'] > 0, df['price'] * 0.17, 0)