Brigita Jon

Posted on Jun 27 • Edited on Jun 29

Dealing with Nothing: What SAS Taught Me About Missing Values in Python (as a Beginner)

#python #dataanalysis #sas #beginners

After a long break from data analysis, I’m #back-at-It! Currently, I’m working on a SAS-to-Python migration project, and I’m documenting what I learn — especially the unexpected things.

This post is all about one such surprise: missing values.

Funny how much time data analysts spend handling... nothing. That’s a rhetorical question I found myself thinking the other day.

I’m talking about missing values, which, if handled improperly, can break logic, skew results, or silently wipe out rows during analysis.

As I’ve been working through the SAS-to-Python conversion over the past couple of months, I’ve started noticing differences in how each handles missing data. Here’s a summary of what I found, plus a few practical examples at the end.

Different languages, different "nothings"

Missing values may look the same across tools, but their behavior varies depending on the language, data type, and operation (comparison, filtering, grouping, etc.).

Below is an overview comparing how SAS, Python, and pandas represent and work with missing values:

Missing values representations in most common data types

Data Type	SAS Missing Value	Python Core	pandas Missing Value
Numeric	`.` (dot), `.A–.Z` (specials with sort order)	`float('nan')`, `None` (but `None` breaks math)	`np.nan`, `pd.NA` (nullable `Int64`, `Float64`)
Character	`''` (empty string)	`None`	`None`, `np.nan`, `pd.NA` (used with `string` dtype)
Dates/Times	`.` (still numeric, formatted as date)	`None` (not valid in datetime ops)	`pd.NaT`, `None`, `pd.NA` (used with `datetime64[ns]`)
Boolean	N/A (SAS has no native boolean type)	`None`	`pd.NA`, `None` (nullable Boolean dtype in pandas)

References

SAS

The dot (.) is the universal marker for missing numeric values—including dates and booleans, which are stored as numerics under the hood. SAS also supports 27 special missing values (.A–.Z, ._) for data situations when you need to categorize the missing values. I have never used or met them in my work though!

pandas

pandas uses different missing value markers depending on the data type:

np.nan for floating-point numbers
pd.NaT for datetime types
pd.NA for the newer, more consistent nullable types (such as Int64, string, and nullable Boolean)
For object or string columns, None and np.nan can also appear as missing values.

Sorting behavior: SAS vs Python

In SAS, while sorting numeric values, the missing values will always appear first.

Interestingly, SAS supports special missing values that even have their own rank when sorting:

The standard numeric missing value (.) is sorted before .A, and both are sorted before .Z. The ._ is the smallest.

Sorting numerical values in SAS

Sort Order	Symbol	Description
smallest	`._`	underscore
	`.`	period
	`.A–.Z`	special missing values A (smallest) through Z
	`-n`	negative numbers
	`0`	zero
largest	`+n`	positive numbers

Reference

SAS Documentation – Sorting Missing Values

In pandas, on the other hand, missing values go to the end by default—but you can change this behavior using a parameter:

df.sort_values(by='x', na_position='first')

Detecting missing values

One of the first steps in working with missing values is detecting them.

In SAS, you typically check for missing values using:

if missing(x)      /* universal */
if x = .           /* numeric */
if x = ''          /* character */

In Python, you can use:

pd.isna(x)
pd.isnull(x)   # (These are aliases)

Filling missing values in Python: doing something with nothing

Once you've detected missing values, the next step is to decide how to fill them or you need to do it at all. There are several ways to handle missing values in pandas. Here's an overview of the most common methods:

Fill with a constant value

You can replace all missing values with a specific value, like 0 or 'unknown':
```
df['col'].fillna(0)
df.fillna('unknown')
```
Fill using another column

You can use the values from another column to fill missing entries.
```
df['col_a'].fillna(df['col_b'])
```
Alternatively, you can use the combine_first() method, which behaves like a SQL COALESCE operation:
```
df['col_a'].combine_first(df['col_b'])
```
Forward and backward fill

You can use forward fill or backward fill to propagate non-missing values forward or backward to fill gaps:
```
df.fillna(method='ffill')
df.fillna(method='bfill')
```
You can also limit how far to propagate with the limit parameter:
```
df.fillna(method='ffill', limit=1)
```
Interpolation

For numeric columns, you can use interpolation. By default, pandas uses linear interpolation:
```
df['col'].interpolate()
```
Group-based filling

If you want to fill missing values within specific groups, you can use groupby() in combination with a fill method. For example, filling missing values within each group using forward fill:
```
df.groupby('id')['value'].transform(lambda x: x.fillna(method='ffill'))
```

Statistical filling

Another approach is to fill missing values based on statistical measures like the mean, median, or mode of the column:

df['col'].fillna(df['col'].mean())
df['col'].fillna(df['col'].median())
df['col'].fillna(df['col'].mode()[0])

To perform statistical filling within groups:

df['col'] = df.groupby('group')['col'].transform(lambda x: x.fillna(x.mean()))

Inplace updates

If you want to modify the original DataFrame without creating a copy, you can use the inplace=True parameter:

df['col'].fillna(0, inplace=True)

How did Python get me?

And how I made it act Like SAS

I've heard a joke online that stuck with me:

Why does pandas treat NaN == NaN as False?

Because even missing values have trust issues.

Turns out it’s funny because it’s true. While converting SAS code to Python/pandas, I ran into several moments where things didn’t behave quite the way I expected.

Here are a few practical examples where translating from SAS to Python didn’t go as expected — and how to make pandas behave more like SAS.

1. `x < 0`: SAS includes missing, pandas excludes

In SAS:

data filtered;
  set test;
  where x < 0;
run;

This will include standard missing values (.), because SAS treats . as less than any number.

In pandas:

df[df['x'] < 0]

This excludes missing values, because comparisons with NaN, pd.NA, or NaT always return False.

To mimic SAS:

df[(df['x'] < 0) | (df['x'].isna())]

2. `NaN == NaN` is False, but `. == .` is True

In SAS:

if . = . then put "Equal";
/* Outputs: Equal */

In Python:

import numpy as np
print(np.nan == np.nan)

The above returns False.

How to match SAS logic:

import pandas as pd

def equal_or_missing(a, b):
    if pd.isna(a) and pd.isna(b):
        return True
    return a == b

equal_or_missing(np.nan, np.nan)

The above returns True now.

3. Grouping with missing values: pandas excludes them

In SAS, PROC SQL treats . as a valid group and includes it in GROUP BY.

In pandas, groupby() excludes missing values in the grouping column by default.

To include them:

df.groupby('x', dropna=False).mean()

Final thoughts

It’s ironic — some of the trickiest bugs come from values that don’t even exist.
In SAS, missing values are consistent and predictable.
In Python/pandas, you get flexibility — but also more room for surprises.

If you're also transitioning from SAS to Python or have your own gotchas around missing values, please share!

#Back-at-It #Buggy-but-getting-there

DEV Community

Dealing with Nothing: What SAS Taught Me About Missing Values in Python (as a Beginner)

Different languages, different "nothings"

Sorting behavior: SAS vs Python

Detecting missing values

Filling missing values in Python: doing something with nothing

How did Python get me?

And how I made it act Like SAS

1. `x < 0`: SAS includes missing, pandas excludes

2. `NaN == NaN` is False, but `. == .` is True

3. Grouping with missing values: pandas excludes them

Final thoughts

Top comments (0)

Different languages, different "nothings"

Sorting behavior: SAS vs Python

Detecting missing values

Filling missing values in Python: doing something with nothing

How did Python get me?

And how I made it act Like SAS

1. x < 0: SAS includes missing, pandas excludes

2. NaN == NaN is False, but . == . is True

3. Grouping with missing values: pandas excludes them

Final thoughts

1. `x < 0`: SAS includes missing, pandas excludes

2. `NaN == NaN` is False, but `. == .` is True