Dear Readers,
NULL in data is like a weed, it can mess up your analysis and calculations, and ultimately bleed the value out of your analytics work. In this article, we’ll dive into what NULL is, why it exists, when it appears, and how to deal with it in Snowflake ?
So, without further ado, let’s meet this culprit.
What NULL?
In SQL, NULL is basically the database equivalent of a shrug.It is a special marker that represents a missing or unknown value. It's not the same as an empty string ('') or the number zero (0). It simply means “I don’t know” or “there’s no value here at all.”
Why NULL Exists?
NULL is essential because sometimes your data just doesn’t exist yet, or never will. NULL allows the database to acknowledge the gap without faking a value.
Eg:
- A customer doesn’t have a middle name.
- A product doesn’t have a discount yet.
- A delivery date is unknown at the time of record insertion.
When NULL Appears?
NULL occurs anytime there is a lack of value from the user.
Common reasons include:
- Optional fields with no provided value.
- Data not yet available (future delivery dates, pending approvals)
- Inapplicable values (someone’s “date of divorce” who never married)
How to Deal with NULL in Snowflake?
Snowflake provides multiple functions and methods to deal with NULL dataset aka unknown data similar to other databases. I have listed some of them below:
- Use IS NULL / IS NOT NULL to identify the NULL data.
- Use COALESCE(col, default) or IFNULL(col, default) to replace them in select , joins, aggregates etc
- Use explicit filter on WHERE clauses to include NULL rows as it can can drop NULL rows unexpectedly.
- Use NVL, CASE WHEN, or similar functions to handle them in calculations or strings.
- Comparisons with NULL return UNKNOWN in SQL’s three-valued logic. Use COALESCE or IS NULL checks to avoid surprises.
Eg:
==Using NULL/NOT NULL
-- Finds all employees without a manager
SELECT name FROM employees WHERE manager_id IS NULL;
-- Finds all employees with a manager
SELECT name FROM employees WHERE manager_id IS NOT NULL;
==Using COALESCE
-- Calculates total compensation, treating a NULL bonus as 0
SELECT salary + COALESCE(bonus, 0) AS total_compensation FROM employees
The Trouble NULL Causes:
1. Comparison Traps
NULL comparisons are a common pitfall because standard operators like = or != don’t work as expected. Any comparison involving NULL returns UNKNOWN, not TRUE or FALSE. Use IS NULL or IS NOT NULL to handle the traps.
Eg:
--Evaluates to UNKNOWN when NULL = NULL
SELECT *
FROM employees
WHERE manager_id = NULL;
-- The correct way to find employees with a NULL manager_id is:
SELECT *
FROM employees
WHERE manager_id IS NULL;
2. Math & Concatenation Failures:
NULL acts like a virus in arithmetic and string concatenation operations. Any operation involving NULL will result in NULL.
Eg:
--Return of NULL total+compensation when bonus is NULL:
SELECT salary + bonus AS total_compensation
FROM employees;
--The correct way to add bonus is use COALESCE :
SELECT salary + COALESCE(bonus, 0) AS total_compensation
FROM employees;
3. Aggregate Surprises:
Aggregate functions like SUM() and AVG() ignore NULL values. This can lead to misleading results if you’re not aware of it.
Eg:
--Excludes NULL rows
SELECT AVG(salary) AS avg_salary, SUM(salary) AS total_salary
FROM employees;
--Use COALESCE to treat NULL salaries as 0 to get more accurate company-wide average
SELECT AVG(COALESCE(salary, 0)) FROM employees;
4. Join Issues:
NULL values in join keys never match each other. If an employee has a NULL in manager_id, that row won’t link to any id in the employees table even if other employees also have NULL as manager_id.
Use a LEFT JOIN + COALESCE to tackle the NULL issue.In following demonstrated way, you keep all employees in the output, and NULL managers are clearly labeled without risking false matches.
Eg:
--This INNER JOIN excludes employees with no manager_id
SELECT
E.name AS employee,
M.name AS manager
FROM employees AS E
INNER JOIN
employees AS M
ON E.manager_id = M.id;
--Use COALESCE to include employees without a manager
SELECT
E.name AS employee,
COALESCE(M.name, 'No Manager') AS manager
FROM employees E
LEFT JOIN employees M
ON E.manager_id = M.id;
5.Logic Surprises
In SQL’s three-valued logic (TRUE, FALSE, UNKNOWN), an UNKNOWN result in a WHERE clause acts like FALSE and discards the row. However, UNKNOWN can have different effects in other clauses such as CASE.
Eg:
--Logic Mishandling for NULL data.
SELECT
CASE
WHEN bonus > 10000 THEN 'High Bonus'
WHEN bonus > 5000 THEN 'Mid Bonus'
ELSE 'Low or No Bonus'
END AS bonus_category
FROM employees;
-- Explicitly checks for NULL first
SELECT
CASE
WHEN bonus IS NULL THEN 'No Bonus'
WHEN bonus > 10000 THEN 'High Bonus'
WHEN bonus > 5000 THEN 'Mid Bonus'
ELSE 'Low Bonus'
END AS bonus_category
FROM employees;
Closing thoughts,
NULLs aren’t a bug, they’re part of SQL’s design.NULL is tricky. The safest approach is to scan your dataset for every attribute that could be NULL, then apply handling strategies suited to your business context. In Snowflake, a few well-placed COALESCE, IS NULL, and CASE statements can save you from major data quality headaches.
Hope you enjoyed the article! If you like my writing, feel free to connect with me on Linkedin and follow for more such interesting reads.
Top comments (0)