After a long break from data analysis, I’m #back-at-It! Currently, I’m working on a SAS-to-Python migration project, and I’m documenting what I learn — especially the unexpected things.
This post is all about one such surprise: missing values.
Funny how much time data analysts spend handling... nothing. That’s a rhetorical question I found myself thinking the other day.
I’m talking about missing values, which, if handled improperly, can break logic, skew results, or silently wipe out rows during analysis.
As I’ve been working through the SAS-to-Python conversion over the past couple of months, I’ve started noticing differences in how each handles missing data. Here’s a summary of what I found, plus a few practical examples at the end.
Different languages, different "nothings"
Missing values may look the same across tools, but their behavior varies depending on the language, data type, and operation (comparison, filtering, grouping, etc.).
Below is an overview comparing how SAS, Python, and pandas represent and work with missing values:
Missing values representations in most common data types
| Data Type | SAS Missing Value | Python Core | pandas Missing Value | 
|---|---|---|---|
| Numeric | .(dot),.A–.Z(specials with sort order) | float('nan'),None(butNonebreaks math) | np.nan,pd.NA(nullableInt64,Float64) | 
| Character | ''(empty string) | None | None,np.nan,pd.NA(used withstringdtype) | 
| Dates/Times | .(still numeric, formatted as date) | None(not valid in datetime ops) | pd.NaT,None,pd.NA(used withdatetime64[ns]) | 
| Boolean | N/A (SAS has no native boolean type) | None | pd.NA,None(nullable Boolean dtype in pandas) | 
References
SAS
The dot (.) is the universal marker for missing numeric values—including dates and booleans, which are stored as numerics under the hood. SAS also supports 27 special missing values (.A–.Z, ._) for  data situations when you need to categorize the missing values. I have never used or met them in my work though!
pandas
pandas uses different missing value markers depending on the data type:
- 
np.nanfor floating-point numbers
- 
pd.NaTfor datetime types
- 
pd.NAfor the newer, more consistent nullable types (such asInt64,string, and nullable Boolean)
- For object or string columns, Noneandnp.nancan also appear as missing values.
Sorting behavior: SAS vs Python
In SAS, while sorting numeric values, the missing values will always appear first.
Interestingly, SAS supports special missing values that even have their own rank when sorting:
The standard numeric missing value (.) is sorted before .A, and both are sorted before .Z. The ._ is the smallest.
Sorting numerical values in SAS
| Sort Order | Symbol | Description | 
|---|---|---|
| smallest | ._ | underscore | 
| . | period | |
| .A–.Z | special missing values A (smallest) through Z | |
| -n | negative numbers | |
| 0 | zero | |
| largest | +n | positive numbers | 
Reference
In pandas, on the other hand, missing values go to the end by default—but you can change this behavior using a parameter:
df.sort_values(by='x', na_position='first')
Detecting missing values
One of the first steps in working with missing values is detecting them.
In SAS, you typically check for missing values using:
if missing(x)      /* universal */
if x = .           /* numeric */
if x = ''          /* character */
In Python, you can use:
pd.isna(x)
pd.isnull(x)   # (These are aliases)
Filling missing values in Python: doing something with nothing
Once you've detected missing values, the next step is to decide how to fill them or you need to do it at all. There are several ways to handle missing values in pandas. Here's an overview of the most common methods:
- 
Fill with a constant value You can replace all missing values with a specific value, like 0or'unknown':
 df['col'].fillna(0) df.fillna('unknown')
- 
Fill using another column You can use the values from another column to fill missing entries. 
 df['col_a'].fillna(df['col_b'])Alternatively, you can use the combine_first()method, which behaves like a SQLCOALESCEoperation:
 df['col_a'].combine_first(df['col_b'])
- 
Forward and backward fill You can use forward fill or backward fill to propagate non-missing values forward or backward to fill gaps: 
 df.fillna(method='ffill') df.fillna(method='bfill')You can also limit how far to propagate with the limitparameter:
 df.fillna(method='ffill', limit=1)
- 
Interpolation For numeric columns, you can use interpolation. By default, pandas uses linear interpolation: 
 df['col'].interpolate()
- 
Group-based filling If you want to fill missing values within specific groups, you can use groupby()in combination with a fill method. For example, filling missing values within each group using forward fill:
 df.groupby('id')['value'].transform(lambda x: x.fillna(method='ffill'))
- 
Statistical filling Another approach is to fill missing values based on statistical measures like the mean, median, or mode of the column: 
 df['col'].fillna(df['col'].mean()) df['col'].fillna(df['col'].median()) df['col'].fillna(df['col'].mode()[0])To perform statistical filling within groups: 
 df['col'] = df.groupby('group')['col'].transform(lambda x: x.fillna(x.mean()))
Inplace updates
If you want to modify the original DataFrame without creating a copy, you can use the inplace=True parameter:
df['col'].fillna(0, inplace=True)
How did Python get me?
And how I made it act Like SAS
I've heard a joke online that stuck with me:
Why does pandas treat NaN == NaN as False?
Because even missing values have trust issues.
Turns out it’s funny because it’s true. While converting SAS code to Python/pandas, I ran into several moments where things didn’t behave quite the way I expected.
Here are a few practical examples where translating from SAS to Python didn’t go as expected — and how to make pandas behave more like SAS.
  
  
  1. x < 0: SAS includes missing, pandas excludes
In SAS:
data filtered;
  set test;
  where x < 0;
run;
This will include standard missing values (.), because SAS treats . as less than any number.
In pandas:
df[df['x'] < 0]
This excludes missing values, because comparisons with NaN, pd.NA, or NaT always return False.
To mimic SAS:
df[(df['x'] < 0) | (df['x'].isna())]
  
  
  2. NaN == NaN is False, but . == . is True
In SAS:
if . = . then put "Equal";
/* Outputs: Equal */
In Python:
import numpy as np
print(np.nan == np.nan)
The above returns False.
How to match SAS logic:
import pandas as pd
def equal_or_missing(a, b):
    if pd.isna(a) and pd.isna(b):
        return True
    return a == b
equal_or_missing(np.nan, np.nan)
The above returns True now.
3. Grouping with missing values: pandas excludes them
In SAS, PROC SQL treats . as a valid group and includes it in GROUP BY.
In pandas, groupby() excludes missing values in the grouping column by default.
To include them:
df.groupby('x', dropna=False).mean()
Final thoughts
It’s ironic — some of the trickiest bugs come from values that don’t even exist.
In SAS, missing values are consistent and predictable.
In Python/pandas, you get flexibility — but also more room for surprises.
If you're also transitioning from SAS to Python or have your own gotchas around missing values, please share!
#Back-at-It #Buggy-but-getting-there
 
 
              
 
    
Top comments (0)