This is part 1 of my Pandas Gotchas series — short, sharp lessons on mistakes that trip up even experienced developers (and show up in interviews).
Let’s start with a classic:
df[df['category'] == 'electronics']['price'] *= 0.9
At first glance, it looks fine. You’re applying a 10% discount to electronics.
But here’s the catch: sometimes this works, sometimes it doesn’t.
- Did the original DataFrame change?
- Or did your code silently fail?
If you’re unsure, welcome to one of Pandas’ most confusing traps: views vs. copies'
and the dreaded chained indexing
.
This question isn't just about syntax; it's about your understanding of how Pandas works under the hood.
The Core Problem: Views vs. Copies
When you select a subset of a DataFrame, Pandas might return one of two things:
A View: This is a "window" into the original DataFrame. If you modify a view, the original DataFrame is also modified. This is memory-efficient.
A Copy: This is a brand new DataFrame, completely independent of the original. Modifying the copy will not change the original.
The danger is that Pandas doesn't guarantee which one you'll get. This ambiguity is the source of the problem.
The Culprit: Chained Indexing
The problematic code df[df['category'] == 'electronics']['price'] *= 0.9
is an example of chained indexing. Let's break it down:
-
df[df['category'] == 'electronics']
: This is the first operation. Pandas executes this and returns a DataFrame. Is it a view or a copy? We don't know for sure. Let's call this temporary resulttemp_df
.
['price'] *= 0.9
: This second operation is performed on temp_df
.
If temp_df
was a copy, you just modified a temporary object that is immediately discarded. The original df remains unchanged. This is a silent failure – the worst kind of bug.
The [SettingWithCopyWarning]
(https://pandas.pydata.org/docs/reference/api/pandas.errors.SettingWithCopyWarning.html) is Pandas' way of telling you, "Hey, you're trying to modify something that might be a copy. I can't be sure if this will work as you expect."
The Solution: Use .loc for Assignments
The correct, unambiguous way to perform this operation is with the .loc
indexer. .loc
allows you to specify the rows and columns you want to access or modify in a single operation.
The syntax is .loc[row_indexer, column_indexer]
.
Incorrect (Chained Indexing):
df[df['category'] == 'electronics']['price'] *= 0.9
Correct (Using .loc):
df.loc[df['category'] == 'electronics', 'price'] *= 0.9
This code guarantees that the modification happens on the original df. You are explicitly telling Pandas: "In the DataFrame df, find the rows where category is electronics, select the price column for those rows, and update it."
No ambiguity, no warning, no silent failures.
Takeaway:
🚨 Never trust chained indexing in Pandas.
✅ Always use .loc when modifying data.
It’s not just cleaner — it saves you from nasty bugs and makes your intent unambiguous.
🚀 Stay tuned — this is just Part 1 of the Pandas Gotchas series.
In the upcoming parts, we’ll cover more subtle traps, performance quirks, and interview-style puzzles that every Pandas user should know.
Top comments (0)