<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Brigita Jon</title>
    <description>The latest articles on DEV Community by Brigita Jon (@brigita).</description>
    <link>https://dev.to/brigita</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3298065%2F1f4bb814-1b98-4c31-9b93-3aa9e23c4ee4.png</url>
      <title>DEV Community: Brigita Jon</title>
      <link>https://dev.to/brigita</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/brigita"/>
    <language>en</language>
    <item>
      <title>Start here/About Me</title>
      <dc:creator>Brigita Jon</dc:creator>
      <pubDate>Fri, 27 Jun 2025 20:49:02 +0000</pubDate>
      <link>https://dev.to/brigita/start-hereabout-me-5cp7</link>
      <guid>https://dev.to/brigita/start-hereabout-me-5cp7</guid>
      <description>&lt;h2&gt;
  
  
  Back-at-It: A Learning Journal from a Returning Data Analyst
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Quick Introduction&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Hi! I’m Brigita, a data analyst making my way back into tech after a long maternity leave.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Why I’m on Dev.to&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
I’ve decided to use this space as a learning journal — not to teach as an expert, but to share what I’m currently learning. Writing helps me reflect and organize the knowledge in my head :)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;What You Can Expect&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Topics I plan to write about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lessons from converting SAS code to Python
&lt;/li&gt;
&lt;li&gt;Building a small project using open data
&lt;/li&gt;
&lt;li&gt;Occasional thoughts on returning to tech after a long break&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;A Note on Comments &amp;amp; Feedback&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
I’m still early in this journey, so if you see something odd or wrong, I’m open to learning — just be kind :)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;What’s Next&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
My first article is about dealing with missing values in Python — inspired by how SAS does it. You can read it &lt;a href="https://dev.to/brigita/dealing-with-nothing-what-sas-taught-me-about-missing-values-in-python-as-a-beginner-103k"&gt;here&lt;/a&gt; if you're curious!&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
    </item>
    <item>
      <title>Dealing with Nothing: What SAS Taught Me About Missing Values in Python (as a Beginner)</title>
      <dc:creator>Brigita Jon</dc:creator>
      <pubDate>Fri, 27 Jun 2025 20:29:07 +0000</pubDate>
      <link>https://dev.to/brigita/dealing-with-nothing-what-sas-taught-me-about-missing-values-in-python-as-a-beginner-103k</link>
      <guid>https://dev.to/brigita/dealing-with-nothing-what-sas-taught-me-about-missing-values-in-python-as-a-beginner-103k</guid>
      <description>&lt;p&gt;&lt;em&gt;After a long break from data analysis, I’m #back-at-It! Currently, I’m working on a SAS-to-Python migration project, and I’m documenting what I learn — especially the unexpected things.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This post is all about one such surprise: &lt;strong&gt;missing values&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;Funny how much time data analysts spend handling... nothing. That’s a rhetorical question I found myself thinking the other day.&lt;/p&gt;

&lt;p&gt;I’m talking about missing values, which, if handled improperly, can break logic, skew results, or silently wipe out rows during analysis.&lt;/p&gt;

&lt;p&gt;As I’ve been working through the SAS-to-Python conversion over the past couple of months, I’ve started noticing differences in how each handles missing data. Here’s a summary of what I found, plus a few practical examples at the end.&lt;/p&gt;




&lt;h2&gt;
  
  
  Different languages, different "nothings"
&lt;/h2&gt;

&lt;p&gt;Missing values may look the same across tools, but their behavior varies depending on the language, data type, and operation (comparison, filtering, grouping, etc.).&lt;/p&gt;

&lt;p&gt;Below is an overview comparing how SAS, Python, and pandas represent and work with missing values:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Missing values representations in most common data types&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Data Type&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;SAS Missing Value&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Python Core&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;pandas Missing Value&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Numeric&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;.&lt;/code&gt; (dot), &lt;code&gt;.A–.Z&lt;/code&gt; (specials with sort order)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;float('nan')&lt;/code&gt;, &lt;code&gt;None&lt;/code&gt; (but &lt;code&gt;None&lt;/code&gt; breaks math)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;np.nan&lt;/code&gt;, &lt;code&gt;pd.NA&lt;/code&gt; (nullable &lt;code&gt;Int64&lt;/code&gt;, &lt;code&gt;Float64&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Character&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;''&lt;/code&gt; (empty string)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;None&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;None&lt;/code&gt;, &lt;code&gt;np.nan&lt;/code&gt;, &lt;code&gt;pd.NA&lt;/code&gt; (used with &lt;code&gt;string&lt;/code&gt; dtype)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dates/Times&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;.&lt;/code&gt; (still numeric, formatted as date)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;None&lt;/code&gt; (not valid in datetime ops)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;pd.NaT&lt;/code&gt;, &lt;code&gt;None&lt;/code&gt;, &lt;code&gt;pd.NA&lt;/code&gt; (used with &lt;code&gt;datetime64[ns]&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Boolean&lt;/td&gt;
&lt;td&gt;N/A (SAS has no native boolean type)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;None&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;pd.NA&lt;/code&gt;, &lt;code&gt;None&lt;/code&gt; (nullable Boolean dtype in pandas)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;References&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://pandas.pydata.org/docs/user_guide/missing_data.html#missing-data" rel="noopener noreferrer"&gt;pandas documentation – Working with missing data&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a001292604.htm" rel="noopener noreferrer"&gt;SAS documentation – Missing Values&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;SAS&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The dot (&lt;code&gt;.&lt;/code&gt;) is the universal marker for missing numeric values—including dates and booleans, which are stored as numerics under the hood. SAS also supports 27 special missing values (&lt;code&gt;.A–.Z&lt;/code&gt;, &lt;code&gt;._&lt;/code&gt;) for  data situations when you need to categorize the missing values. I have never used or met them in my work though!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;pandas&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;pandas uses different missing value markers depending on the data type:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;np.nan&lt;/code&gt; for floating-point numbers
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pd.NaT&lt;/code&gt; for datetime types
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pd.NA&lt;/code&gt; for the newer, more consistent nullable types (such as &lt;code&gt;Int64&lt;/code&gt;, &lt;code&gt;string&lt;/code&gt;, and nullable Boolean)
&lt;/li&gt;
&lt;li&gt;For object or string columns, &lt;code&gt;None&lt;/code&gt; and &lt;code&gt;np.nan&lt;/code&gt; can also appear as missing values.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Sorting behavior: SAS vs Python
&lt;/h2&gt;

&lt;p&gt;In SAS, while sorting numeric values, the missing values will always appear first.&lt;/p&gt;

&lt;p&gt;Interestingly, SAS supports special missing values that even have their own rank when sorting:&lt;/p&gt;

&lt;p&gt;The standard numeric missing value (&lt;code&gt;.&lt;/code&gt;) is sorted before &lt;code&gt;.A&lt;/code&gt;, and both are sorted before &lt;code&gt;.Z&lt;/code&gt;. The &lt;code&gt;._&lt;/code&gt; is the smallest.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sorting numerical values in SAS&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Sort Order&lt;/th&gt;
&lt;th&gt;Symbol&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;smallest&lt;/td&gt;
&lt;td&gt;&lt;code&gt;._&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;underscore&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;period&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.A–.Z&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;special missing values A (smallest) through Z&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;-n&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;negative numbers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;zero&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;largest&lt;/td&gt;
&lt;td&gt;&lt;code&gt;+n&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;positive numbers&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Reference&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000989180.htm" rel="noopener noreferrer"&gt;SAS Documentation – Sorting Missing Values&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In pandas, on the other hand, missing values go to the end by default—but you can change this behavior using a parameter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;na_position&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;first&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Detecting missing values
&lt;/h2&gt;

&lt;p&gt;One of the first steps in working with missing values is detecting them.&lt;/p&gt;

&lt;p&gt;In SAS, you typically check for missing values using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sas"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;missing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;      &lt;span class="cm"&gt;/* universal */&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;           &lt;span class="cm"&gt;/* numeric */&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;          &lt;span class="cm"&gt;/* character */&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In Python, you can use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isnull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# (These are aliases)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Filling missing values in Python: doing something with nothing
&lt;/h2&gt;

&lt;p&gt;Once you've detected missing values, the next step is to decide how to fill them or you need to do it at all. There are several ways to handle missing values in pandas. Here's an overview of the most common methods:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Fill with a constant value&lt;/p&gt;

&lt;p&gt;You can replace all missing values with a specific value, like &lt;code&gt;0&lt;/code&gt; or &lt;code&gt;'unknown'&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;col&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unknown&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Fill using another column&lt;/p&gt;

&lt;p&gt;You can use the values from another column to fill missing entries.&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;col_a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;col_b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;Alternatively, you can use the &lt;code&gt;combine_first()&lt;/code&gt; method, which behaves like a SQL &lt;code&gt;COALESCE&lt;/code&gt; operation:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;col_a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;combine_first&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;col_b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Forward and backward fill&lt;/p&gt;

&lt;p&gt;You can use forward fill or backward fill to propagate non-missing values forward or backward to fill gaps:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ffill&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bfill&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;You can also limit how far to propagate with the &lt;code&gt;limit&lt;/code&gt; parameter:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ffill&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Interpolation&lt;/p&gt;

&lt;p&gt;For numeric columns, you can use interpolation. By default, pandas uses linear interpolation:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;col&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;interpolate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Group-based filling&lt;/p&gt;

&lt;p&gt;If you want to fill missing values within specific groups, you can use &lt;code&gt;groupby()&lt;/code&gt; in combination with a fill method. For example, filling missing values within each group using forward fill:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ffill&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Statistical filling&lt;/p&gt;

&lt;p&gt;Another approach is to fill missing values based on statistical measures like the mean, median, or mode of the column:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;col&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;col&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;col&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;col&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;median&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;col&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;col&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;


&lt;p&gt;To perform statistical filling within groups:&lt;br&gt;
&lt;/p&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;col&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;group&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;col&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Inplace updates&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you want to modify the original DataFrame without creating a copy, you can use the &lt;code&gt;inplace=True&lt;/code&gt; parameter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;col&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inplace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  How did Python get me?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  And how I made it act Like SAS
&lt;/h3&gt;

&lt;p&gt;I've heard a joke online that stuck with me:  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why does pandas treat NaN == NaN as False?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;Because even missing values have trust issues.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Turns out it’s funny because it’s true. While converting SAS code to Python/pandas, I ran into several moments where things didn’t behave quite the way I expected.&lt;/p&gt;

&lt;p&gt;Here are a few practical examples where translating from SAS to Python didn’t go as expected — and how to make pandas behave more like SAS.&lt;/p&gt;




&lt;h3&gt;
  
  
  1. &lt;code&gt;x &amp;lt; 0&lt;/code&gt;: SAS includes missing, pandas excludes
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;In SAS:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sas"&gt;&lt;code&gt;&lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="nv"&gt;filtered&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;set&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="k"&gt;x&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;run&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will include standard missing values (&lt;code&gt;.&lt;/code&gt;), because SAS treats &lt;code&gt;.&lt;/code&gt; as less than any number.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In pandas:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This excludes missing values, because comparisons with &lt;code&gt;NaN&lt;/code&gt;, &lt;code&gt;pd.NA&lt;/code&gt;, or &lt;code&gt;NaT&lt;/code&gt; always return False.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To mimic SAS:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;isna&lt;/span&gt;&lt;span class="p"&gt;())]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  2. &lt;code&gt;NaN == NaN&lt;/code&gt; is False, but &lt;code&gt;. == .&lt;/code&gt; is True
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;In SAS:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sas"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt; &lt;span class="k"&gt;put&lt;/span&gt; &lt;span class="s2"&gt;"Equal"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="cm"&gt;/* Outputs: Equal */&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;In Python:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nan&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nan&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above returns &lt;code&gt;False&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to match SAS logic:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;equal_or_missing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;

&lt;span class="nf"&gt;equal_or_missing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nan&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nan&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above returns &lt;code&gt;True&lt;/code&gt; now.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Grouping with missing values: pandas excludes them
&lt;/h3&gt;

&lt;p&gt;In SAS, &lt;code&gt;PROC SQL&lt;/code&gt; treats &lt;code&gt;.&lt;/code&gt; as a valid group and includes it in &lt;code&gt;GROUP BY&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In pandas, &lt;code&gt;groupby()&lt;/code&gt; excludes missing values in the grouping column by default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To include them:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dropna&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;It’s ironic — some of the trickiest bugs come from values that don’t even exist.&lt;br&gt;
In SAS, missing values are consistent and predictable.&lt;br&gt;
In Python/pandas, you get flexibility — but also more room for surprises.&lt;/p&gt;

&lt;p&gt;If you're also transitioning from SAS to Python or have your own gotchas around missing values, please share!&lt;/p&gt;

&lt;p&gt;&lt;code&gt;#Back-at-It&lt;/code&gt; &lt;code&gt;#Buggy-but-getting-there&lt;/code&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>dataanalysis</category>
      <category>sas</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
