<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Eyal Trabelsi</title>
    <description>The latest articles on DEV Community by Eyal Trabelsi (@eyaltrabelsi).</description>
    <link>https://dev.to/eyaltrabelsi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F257659%2Fd7edd9b4-d8ca-464b-91f1-453b258f3042.jpeg</url>
      <title>DEV Community: Eyal Trabelsi</title>
      <link>https://dev.to/eyaltrabelsi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/eyaltrabelsi"/>
    <language>en</language>
    <item>
      <title>Introducing Pandas-Log:  package for debugging pandas operations</title>
      <dc:creator>Eyal Trabelsi</dc:creator>
      <pubDate>Fri, 25 Oct 2019 05:03:57 +0000</pubDate>
      <link>https://dev.to/eyaltrabelsi/introducing-pandas-log-package-for-debugging-pandas-operations-47og</link>
      <guid>https://dev.to/eyaltrabelsi/introducing-pandas-log-package-for-debugging-pandas-operations-47og</guid>
      <description>&lt;p&gt;The pandas ecosystem has been invaluable for the data science ecosystem, and thus today most data science tasks consist of series of pandas’ steps to transform raw data into an understandable/usable format.&lt;/p&gt;

&lt;p&gt;These steps’ accuracy is crucial, and thus understanding the unexpected results becomes crucial as well. Unfortunately, the ecosystem lacks the tools to understand those unexpected results.&lt;/p&gt;

&lt;p&gt;That’s why I created &lt;a href="https://github.com/eyaltrabelsi/pandas-log" rel="noopener noreferrer"&gt;Pandas-log&lt;/a&gt;, it provides metadata on each operation which will allow pinpointing the issues. For example, after .query it returns the number of rows being filtered.&lt;/p&gt;

&lt;p&gt;As always I believe its easier to understand with an example so I will use the &lt;a href="https://www.kaggle.com/abcsds/pokemon" rel="noopener noreferrer"&gt;pokemon dataset&lt;/a&gt; to find &lt;b&gt;“who is the weakest non-legendary fire pokemon?”.&lt;/b&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fmax%2F2160%2F1%2A7fOz2hg8dis04NyLD5lhug.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fmax%2F2160%2F1%2A7fOz2hg8dis04NyLD5lhug.jpeg"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  So who is the weakest fire pokemon?
&lt;/h3&gt;

&lt;p&gt;(Link to the Notebook code can be found here)&lt;br&gt;
First, we will import relevant packages and read our pokemon dataset.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas as pd
import numpy as np
import pandas_log
df = pd.read_csv("pokemon.csv")
df.head(10)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fmax%2F2516%2F1%2AfB1TrMcO2G1B29_UOqNBbg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fmax%2F2516%2F1%2AfB1TrMcO2G1B29_UOqNBbg.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  To answer our question who is the weakest non-legendary fire pokemon we will need to:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Filter out legendary pokemon using .query() .&lt;/li&gt;
&lt;li&gt;Keep only fire pokemon using .query() .&lt;/li&gt;
&lt;li&gt;Drop Legendary column using .drop() .&lt;/li&gt;
&lt;li&gt;Keep the weakest pokemon among them using .nsmallest().
In code, It will look something like:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;res = (df.copy()
         .query("legendary==0")
         .query("type_1=='fire' or type_2=='fire'")
         .drop("legendary", axis=1)
         .nsmallest(1,"total"))
res
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fmax%2F1897%2F1%2AGvurKVXyyr9BkVogFaMEmw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fmax%2F1897%2F1%2AGvurKVXyyr9BkVogFaMEmw.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  OH NOO!!! Our code does not work !! We got an empty dataframe!!
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fmax%2F783%2F1%2Ahm0jEPQjOQLhgM3rKmw19g.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fmax%2F783%2F1%2Ahm0jEPQjOQLhgM3rKmw19g.gif"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If only there was a way to track those issues!? Fortunately, that’s what &lt;a href="https://github.com/eyaltrabelsi/pandas-log" rel="noopener noreferrer"&gt;Pandas-log&lt;/a&gt; is for!&lt;br&gt;
with just adding a small context manager to our example we will get relevant information that will help us find the issue printed to stdout.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;with pandas_log.enable():
    res = (df.copy()
             .query("legendary==0")
             .query("type_1=='fire' or type_2=='fire'")
             .drop("legendary", axis=1)
             .nsmallest(1,"total"))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fmax%2F3269%2F1%2AvkvklLnx7vPBFmmPb609Og.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fmax%2F3269%2F1%2AvkvklLnx7vPBFmmPb609Og.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After reading the output it’s clear that the issue is in step 2 as we got 0 rows remaining, so something with the predicate “type_1==’fire’ or type_2==’fire’” is wrong. Indeed pokemon type starts with a capital letter, so let’s run the fixed code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;res = (df.copy()
         .query("legendary==0")
         .query("type_1=='Fire' or type_2=='Fire'")
         .drop("legendary", axis=1)
         .nsmallest(1,"total"))
res
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fmax%2F2437%2F1%2AsXsiWmjMvf_zymZWjSgIPg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fmax%2F2437%2F1%2AsXsiWmjMvf_zymZWjSgIPg.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Whoala we got Slugma !!!!!!!!
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fmax%2F608%2F1%2AX3x06jC4gSJGLYtWcU1FlA.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmiro.medium.com%2Fmax%2F608%2F1%2AX3x06jC4gSJGLYtWcU1FlA.jpeg"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Few last words to say
&lt;/h2&gt;

&lt;p&gt;The package is still in its early stage so it might contain few bugs. Please have a look at the Github repository and suggest some improvements or extensions of the code. I will gladly welcome any kind of constructive feedback and feel free to contribute to &lt;a href="https://github.com/eyaltrabelsi/pandas-log" rel="noopener noreferrer"&gt;Pandas-log&lt;/a&gt; as well! 😉&lt;/p&gt;

</description>
      <category>python</category>
      <category>opensource</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
