Bhavani Ravi

Posted on Feb 8, 2019 • Originally published at Medium on Feb 7, 2019

Learn Pandas Via Usecases — Part 2

#learning #datascience #python #statistics

Learn Pandas Via Usecases — Part 2

More use cases open up more functionalities

In the last blog, I hope I have sold you the idea that Pandas is an amazing library for quick and easy data analysis and it’s much easier to use than you thought.If you have not read my first blog about Pandas, please go through it before you move forward.

Oops !! We missed Some Data

In the last blog, we saw basic Dataframe operations using sample sales data. Let’s assume you are a manager leading a sales team, and you were all happy about the sales trajectory and the pivot representation of the data you learned to create from ourlast blog.

import numpy as np
df.pivot\_table(index=["Country"], 
               columns=["Region"], 
               values=["Quantity"], 
               aggfunc=[np.sum])

That’s when you realize you have missed sales data of a particular quarterbecause it was lost in one of the spreadsheets. Now, what do you do? You already have a report ready to go. How can you incorporate the new data into the current pivot representation without major changes?

If you see, the pivot table is constructed with a single Dataframe df, somehow if we can find a way to feed our new data into the df then we can just re-run the pivot code and voila!! we will get the report again.

So here are the steps we are going to follow,

1. Load the new spreadsheet data into a new Dataframe

df2 = pd.read\_csv("data/Pandas - Q4 Sales.csv")
df2.head()

2. Combine two Dataframe into a single df object,

Using concat

Pandas Concat method concatenates the contents of multiple Dataframes and creates a new Dataframe.

The axis param of the method enables you to concatenate data along rows or columns

result\_df = pd.concat([df, df2], axis=0, sort=False)
# axis = 0, append along rows, 
# axis = 1, append along cols

result\_df.tail() # tail is similar to head returns last 10 entries

Using append

Unlike concat , the append method adds up data to an existing dataframe instead of creating a new Dataframe. Also, you can notice that we don’t supply any axis parameter here since append method only allows adding new entries as rows.

result\_df = df.append([df2],sort=False)
result\_df.tail()

If you take a closer look, in both cases, the data frames that need to be combined are supplied as a python list [df1, df2]. This implies that we can combine as many Dataframes as we want

3. Re-run the pivot code

pivot = result\_df.pivot\_table(index=["Country"], 
                              columns=["Region"], 
                              values="Quantity")

Charts are better than tables

You have a couple of hours for your final meeting. Your presentation is concrete, your sales are good but still, something is missing. Charts. For a management person who was so used to spreadsheets charts, leaving them behind is not a good idea. But, we have a short time to go back to spreadsheets, don’t we? Worry not, Pandas comes with a built-in charting framework which lets you draw graphs of our pivot representation

Perfection

As a person who was known for your perfection something doesn’t sit well in you. One of the tabular representations that you have created has unnecessary information that doesn’t interest your management, and a couple of columns have names that are used internally in your company and will not ring any bell to the management.

Worry not, we can do it all in one shot and pretty quick. In pandas terms, we call this method chaining.

Method chaining enables you to perform a various transformation on the same data without storing the intermediate result.

Explicit is better than implicit hence let’s rename “Total” to “Total Sales”
We don’t need the date of purchase just the year and quarter
We don’t need the requester of purchase, Salesperson, and Date of purchase. So let’s drop it.

result\_df.rename({"Total": "Total Sales"}, axis=1)\
         .assign(Quarter=result\_df['Date of Purchase'].dt.quarter, \
                 Year=result\_df['Date of Purchase'].dt.year) \
         .drop(["Requester", "Sales Person", "Date of Purchase"], axis=1).head()

One Last Thing

With that, our final report looks good and guess what? Your management is not only happy about your sales this year but also excited about your new found love for Pandas, but there is just one last thing remaining, you need to send the final data as a CSV back to your management. But worry not we have pandas to do it for you.

result\_df.to\_csv(path\_or\_buf="Export\_Data.csv")

An “Export_Data.csv” file would be created in your current path which you can happily send to your management as an email attachment.

As you rest back on your seat, you want to automate the pandas experiment that you just did for the future sales reports. Thankfully, you have an intern who is joining you in a couple of days. It will be a great project for him to pick it up. Something in me tells that things aren’t going to be as easy as it was for you. which we will see in the next blog “What’s wrong with Pandas?”

Did the blog nudge you to deep dive into pandas?

Hold the “claps” icon and give a shout on [_twitter](https://twitter.com/@bhavaniravi)._

Follow to stay tuned on future blogs.

Forem

Learn Pandas Via Usecases — Part 2

Learn Pandas Via Usecases — Part 2

Oops !! We missed Some Data

1. Load the new spreadsheet data into a new Dataframe

2. Combine two Dataframe into a single df object,

3. Re-run the pivot code

Charts are better than tables

Perfection

One Last Thing

Top comments (0)

Read next

AI Models Still Fail Basic Physics Tests, New Benchmark Shows 18.4% Improvement Possible

New Context Pruning Method Cuts AI Memory Use by 50% While Maintaining Accuracy

Clean up S3 with Batch Operations, Tags and Lifecycle policies, they said. It will be cheaper, they said.

How to Configure VSCode for Auto Formatting and Linting in Python