Handling NULL in the DBs

IrinaBezgina — Wed, 03 May 2023 21:00:52 +0000

NULL represents an unknown value or its absence. But an absence of value doesn't mean an absence of problems with it.

First of all different DBMSs may behave differently when handling NULL values (depending on SQL dialect or architecture). But I'll concentrate on common patterns in this topic. Let's have a look at the typical hurdles and ways to deal with them based on the two main situations (data modeling and querying the data).

Data Modelling

Any data storage begins with data modeling. And it would be useful to understand how can NULL values affect your database design.

Primary key uniquely identifies each record in a table, and cannot contain NULL values.
Avoid situations when foreign key is a NULL value. It might lead to broken relationships between tables. You could avoid it setting NOT NULL constraint.
Not all DBMSs treat NULL as a unique value. So it's better to avoid unique constraints in the columns which possibly could have NULL values.
Partitioning by the column which potentially has NULL values could lead to uneven data distribution and hence reduce the benefits of partitioning. You may encounter a similar problem in the bucketing/clustering process.

Querying the data

Generally any arithmetic operation involving a NULL will result in NULL. For example, 120 - NULL = NULL.
Be careful when comparing values potentially containing NULLs. NULL = NULL is not true, and neither is NULL <> NULL. Instead, you should use the IS NULL or IS NOT NULL operators to check for NULL values.
Aggregate functions usually ignore NULL values. However, the COUNT function counts NULL values when using the COUNT(*).
Logical operations involving NULL values could result in NULL.
When you filter the values in the column containing NULLs you may lose some important data. For example, you have a table with columns Name and City, and you want to select all the names from all the cities except New York and Washington. Using City IS NOT IN ('New York', 'Washington') will result in rows with the NULL city being lost.
When you INNER JOIN tables if either of the join columns contains a NULL value, the row will not be included in the result set.

Classic handling approaches

The general approach to handling NULL values includes:

Use COALESCE (or analogs depending on DBMS) for the columns which potentially can have a NULL value.
Use IS NULL or IS NOT NULL for comparison operations.
Use OUTER JOIN or handle the JOIN clause using COALESCE when comparing columns possibly containing NULL values.
Create Unit tests and Quality metrics for your data.
Experiment more. Knowledge comes with practice.

DEV Community: IrinaBezgina

Handling NULL in the DBs

Data Modelling

Querying the data

Classic handling approaches