When method names are similar, it's difficult to keep them separate in your mind.
This makes remembering them harder.
Pandas has a slew of methods for creating and adjusting a DataFrame index.
This is a brief guide to help you create a little mental space between methods for easier memorization.
The Jupyter Notebook is on Kaggle here.
import pandas as pd
import numpy as np
Make a DataFrame without specifying an index (you get a default index).
df = pd.DataFrame(dict(a=[1,2,3,4], b=[2,5,6,4]))
df
a | b | |
---|---|---|
0 | 1 | 2 |
1 | 2 | 5 |
2 | 3 | 6 |
3 | 4 | 4 |
Make a DataFrame with an index by using the index keyword argument.
df2 = pd.DataFrame(dict(a=[1,2,3,4], b=[2,5,6,4]), index = [1,2,5,6])
df2
a | b | |
---|---|---|
1 | 1 | 2 |
2 | 2 | 5 |
5 | 3 | 6 |
6 | 4 | 4 |
Move a column to be the index with .set_index()
df3 = df2.set_index("a")
df3
b | |
---|---|
a | |
1 | 2 |
2 | 5 |
3 | 6 |
4 | 4 |
Rename the index values from scratch with .index
df3.index = [2,3,4,5]
df3
b | |
---|---|
2 | 2 |
3 | 5 |
4 | 6 |
5 | 4 |
Note that index
is a property of the DataFrame not a method, so the syntax is different.
Nuke the index values and start over from 0 with .reset_index()
df4 = df3.reset_index()
df4
index | b | |
---|---|---|
0 | 2 | 2 |
1 | 3 | 5 |
2 | 4 | 6 |
3 | 5 | 4 |
If you don't want the index to become a column, pass drop=True
to reset_index()
.
df5 = df3.reset_index(drop=True)
df5
b | |
---|---|
0 | 2 |
1 | 5 |
2 | 6 |
3 | 4 |
Reorder the rows with .reindex()
df6 = df5.reindex([2,3,1,0])
df6
b | |
---|---|
2 | 6 |
3 | 4 |
1 | 5 |
0 | 2 |
Passing a value that isn't in the index results in a NaN.
df7 = df5.reindex([2,3,1,0,6])
df7
b | |
---|---|
2 | 6.0 |
3 | 4.0 |
1 | 5.0 |
0 | 2.0 |
6 | NaN |
Advice
Ideally, add an index when you create your DataFrame with index =
.
If reading from a .csv file you can set an index column by passing the column number.
For example:
df = pd.read_csv(my_csv, index_col=3)
Or pass index_col=False
to exlcude.
How to set or change the index:
df.set_index()
- move a column to the indexdf.index
- add an index manuallydf.reset_index()
- reset the index to 0, 1, 2 ...df.reindex()
- reorder the rows
Word associations to remember:
set_index()
- move columnindex
- manualreset_index()
- resetreindex
- reorder
Wrap
I hope this article helped you create a little mental space to keep Pandas index methods straight. If it did, please give it some love so other people can find it, too.
I write about Data Science, Dev Ops, Python and other stuff. Check out my other articles if any of that sounds interesting.
Follow me and connect:
Medium
Dev.to
Twitter
LinkedIn
Kaggle
GitHub
Happy indexing!
Top comments (0)