Eyal Trabelsi

Posted on Oct 25, 2019

Introducing Pandas-Log: package for debugging pandas operations

#python #opensource #machinelearning

The pandas ecosystem has been invaluable for the data science ecosystem, and thus today most data science tasks consist of series of pandas’ steps to transform raw data into an understandable/usable format.

These steps’ accuracy is crucial, and thus understanding the unexpected results becomes crucial as well. Unfortunately, the ecosystem lacks the tools to understand those unexpected results.

That’s why I created Pandas-log, it provides metadata on each operation which will allow pinpointing the issues. For example, after .query it returns the number of rows being filtered.

As always I believe its easier to understand with an example so I will use the pokemon dataset to find “who is the weakest non-legendary fire pokemon?”.

So who is the weakest fire pokemon?

(Link to the Notebook code can be found here)
First, we will import relevant packages and read our pokemon dataset.

import pandas as pd
import numpy as np
import pandas_log
df = pd.read_csv("pokemon.csv")
df.head(10)

To answer our question who is the weakest non-legendary fire pokemon we will need to:

Filter out legendary pokemon using .query() .
Keep only fire pokemon using .query() .
Drop Legendary column using .drop() .
Keep the weakest pokemon among them using .nsmallest(). In code, It will look something like:

res = (df.copy()
         .query("legendary==0")
         .query("type_1=='fire' or type_2=='fire'")
         .drop("legendary", axis=1)
         .nsmallest(1,"total"))
res

OH NOO!!! Our code does not work !! We got an empty dataframe!!

If only there was a way to track those issues!? Fortunately, that’s what Pandas-log is for!
with just adding a small context manager to our example we will get relevant information that will help us find the issue printed to stdout.

with pandas_log.enable():
    res = (df.copy()
             .query("legendary==0")
             .query("type_1=='fire' or type_2=='fire'")
             .drop("legendary", axis=1)
             .nsmallest(1,"total"))

After reading the output it’s clear that the issue is in step 2 as we got 0 rows remaining, so something with the predicate “type_1==’fire’ or type_2==’fire’” is wrong. Indeed pokemon type starts with a capital letter, so let’s run the fixed code.

res = (df.copy()
         .query("legendary==0")
         .query("type_1=='Fire' or type_2=='Fire'")
         .drop("legendary", axis=1)
         .nsmallest(1,"total"))
res

Whoala we got Slugma !!!!!!!!

Few last words to say

The package is still in its early stage so it might contain few bugs. Please have a look at the Github repository and suggest some improvements or extensions of the code. I will gladly welcome any kind of constructive feedback and feel free to contribute to Pandas-log as well! 😉

DEV Community

Introducing Pandas-Log: package for debugging pandas operations

So who is the weakest fire pokemon?

To answer our question who is the weakest non-legendary fire pokemon we will need to:

OH NOO!!! Our code does not work !! We got an empty dataframe!!

Whoala we got Slugma !!!!!!!!

Few last words to say

Top comments (0)

Read next

De cero a Ingeniero de Software

I built a Sass template. You can just copy me!

Should You Use an Open-source SaaS Boilerplate Starter or a $300+ Paid One?

🌷Creating a Gallery App in JavaScript with HMPL