DEV Community

Timothy Cummins
Timothy Cummins

Posted on

Sorting Your Pandas DataFrame

Introduction

Since writing my first blog on using pandasql I have meant to continue with some other short tutorials on awesome features in pandas to organize your data. So today I am going to talk about using the sort_values function to organize your DataFrame.

Sort_values is a great feature that can help you reorganize all of that data you found into an understandable table that is pretty and neat and won’t set off the ocd that I know all of you ‘data nerds’ have. If you want to follow along with this example you can find this data located at https://catalog.data.gov/dataset/hpd-crime-incidents under Comma Separated Values File. From there if Downloads folder directly off of your (cant remember word for saved first location in terminal) you can just copy along exactly what I have, if not you will have to change the file path below to where you have the downloaded file saved.

Importing Data to Pandas

So first thing we are going to import the Pandas library for python because it is the best way to sort/ organize and create a DataFrame. Then we will have Pandas directly convert the .csv file to a DataFrame.

import pandas as pd

data = pd.read_csv(‘~/Downloads/HPD_Crime_Incidents.csv')

Here we take a look at our DataFrame and see that our data is sorted by the ObjectID, which seems to be the crimes ordered by when they were filed. Though let's say we want the be able to scroll through our data have the crimes organize by what type of crime it was or by the location of the crime or even both.

Alt Text

Sorting

So now let's take a look at the main parameter of our sort_values function, “by”. This parameter is where you put in the name of the column you want to sort by or a list of columns in the order that you want them sorted. As you can see below I have the data sorted by the location of the crime and then by the type of crime, so that it gives me the types grouped under the location.

sortdata = data.sort_values(by = ['BlockAddress','Type'])

Alt Text

Resetting the Index

Then now that we have our DataFrame sorted the way we would like it you may notice that the index is still set in the original order. Well fear not because if you look a little deeper into the parameters of sort_values we can see we have one called ignore_index, this parameter is key for beautifying your new DataFrame to make it yours. The default setting for this parameter is False but if you change it to true, it will reset your index so that it is now in the order you placed it.

sortdata = data.sort_values(by = ['BlockAddress','Type'],ignore_index=True)

Alt Text

Conclusion

There you are folks, for those of you just getting started in using pandas or if you just have OCD, you can know sort and clean up your DataFrame.

Top comments (1)

Collapse
 
waylonwalker profile image
Waylon Walker

Thanks for sharing what you have learned about pandas. Great tip on the ignore_index flag. I generally use the .reset_index() method.