When method names are similar, it's difficult to keep them separate in your mind. 
This makes remembering them harder. 
Pandas has a slew of methods for creating and adjusting a DataFrame index.
This is a brief guide to help you create a little mental space between methods for easier memorization.
The Jupyter Notebook is on Kaggle here.
import pandas as pd
import numpy as np
Make a DataFrame without specifying an index (you get a default index).
df = pd.DataFrame(dict(a=[1,2,3,4], b=[2,5,6,4]))
df
| a | b | |
|---|---|---|
| 0 | 1 | 2 | 
| 1 | 2 | 5 | 
| 2 | 3 | 6 | 
| 3 | 4 | 4 | 
Make a DataFrame with an index by using the index keyword argument.
df2 = pd.DataFrame(dict(a=[1,2,3,4], b=[2,5,6,4]), index = [1,2,5,6])
df2
| a | b | |
|---|---|---|
| 1 | 1 | 2 | 
| 2 | 2 | 5 | 
| 5 | 3 | 6 | 
| 6 | 4 | 4 | 
Move a column to be the index with .set_index()
df3 = df2.set_index("a")
df3
| b | |
|---|---|
| a | |
| 1 | 2 | 
| 2 | 5 | 
| 3 | 6 | 
| 4 | 4 | 
Rename the index values from scratch with .index
df3.index = [2,3,4,5]
df3
| b | |
|---|---|
| 2 | 2 | 
| 3 | 5 | 
| 4 | 6 | 
| 5 | 4 | 
Note that index is a property of the DataFrame not a method, so the syntax is different.
Nuke the index values and start over from 0 with .reset_index()
df4 = df3.reset_index()
df4
| index | b | |
|---|---|---|
| 0 | 2 | 2 | 
| 1 | 3 | 5 | 
| 2 | 4 | 6 | 
| 3 | 5 | 4 | 
If you don't want the index to become a column, pass drop=True to reset_index().
df5 = df3.reset_index(drop=True)
df5
| b | |
|---|---|
| 0 | 2 | 
| 1 | 5 | 
| 2 | 6 | 
| 3 | 4 | 
Reorder the rows with .reindex()
df6 = df5.reindex([2,3,1,0])
df6
| b | |
|---|---|
| 2 | 6 | 
| 3 | 4 | 
| 1 | 5 | 
| 0 | 2 | 
Passing a value that isn't in the index results in a NaN.
df7 = df5.reindex([2,3,1,0,6])
df7
| b | |
|---|---|
| 2 | 6.0 | 
| 3 | 4.0 | 
| 1 | 5.0 | 
| 0 | 2.0 | 
| 6 | NaN | 
Advice
Ideally, add an index when you create your DataFrame with index =. 
If reading from a .csv file you can set an index column by passing the column number.
For example:
df = pd.read_csv(my_csv, index_col=3)
Or pass index_col=False to exlcude.
How to set or change the index:
- df.set_index()- move a column to the index
- df.index- add an index manually
- df.reset_index()- reset the index to 0, 1, 2 ...
- df.reindex()- reorder the rows
Word associations to remember:
- set_index()- move column
- index- manual
- reset_index()- reset
- reindex- reorder
Wrap
I hope this article helped you create a little mental space to keep Pandas index methods straight. If it did, please give it some love so other people can find it, too.
I write about Data Science, Dev Ops, Python and other stuff. Check out my other articles if any of that sounds interesting.
Follow me and connect:
Medium
Dev.to
Twitter
LinkedIn
Kaggle
GitHub
Happy indexing!
 
 
              

 
    
Top comments (0)