NULL in a gbase database doesn't behave like an ordinary value, nor is it simply a blank. It affects comparisons, deduplication, aggregations, and conditional logic in ways that often contradict intuition. Many "data is fine but results are off" issues trace back to mishandled NULL semantics.
First, Separate Three Easily Confused Concepts
| Value | Meaning | Most Common Misjudgment |
|---|---|---|
| NULL | Unknown, missing, not assigned | Treated as an empty string |
| Empty string '' | Zero‑length string | Treated as NULL |
| Default value | Filled by business rules or DDL | Treated as a real business value |
When these three coexist, you get the classic "some rows match, some don't" confusion.
Common Symptoms
-
WHERE col = ''returns nothing, but rows clearly exist. -
COUNT(col)is far smaller thanCOUNT(*)— business thinks data was lost. - A mysterious "empty group" appears after
GROUP BY, and nobody can say whether it's NULL or an empty string. - After a
LEFT JOIN, dimension columns become NULL, and subsequent filters or aggregations silently shift the result. -
CASE WHEN col = NULL THEN ...looks logical but produces wrong outcomes.
How to Diagnose
Step 1: Quantify the distribution of NULLs, empty strings, and valid values
Don't guess — query the actual distribution:
SELECT
COUNT(*) AS total_cnt,
SUM(CASE WHEN cust_level IS NULL THEN 1 ELSE 0 END) AS null_cnt,
SUM(CASE WHEN cust_level = '' THEN 1 ELSE 0 END) AS empty_cnt,
SUM(CASE WHEN cust_level IS NOT NULL AND cust_level <> '' THEN 1 ELSE 0 END) AS valid_cnt
FROM dwd_customer;
Often, NULLs and empty strings come from completely different sources.
Step 2: Verify the aggregation function matches the business intent
SELECT COUNT(*) AS total_rows, COUNT(cust_level) AS non_null_rows FROM dwd_customer;
If the requirement is "total row count," don't casually write COUNT(column). The core question is: are you counting records, or non‑null values?
Step 3: Check whether conditions silently exclude NULLs
-- This does not include NULL rows
SELECT * FROM dwd_customer WHERE cust_level <> 'VIP';
-- To include unknowns:
SELECT * FROM dwd_customer WHERE cust_level <> 'VIP' OR cust_level IS NULL;
A Realistic Example
A customer table has channel_code values that are a mix of real values, NULLs, and empty strings. A business query counts users by channel:
SELECT channel_code, COUNT(*) AS cust_cnt
FROM dwd_customer
GROUP BY channel_code;
An "empty channel" row appears, sparking debate. A more robust approach is to standardize the semantics before aggregation:
SELECT
CASE
WHEN channel_code IS NULL THEN 'NULL_VALUE'
WHEN channel_code = '' THEN 'EMPTY_STRING'
ELSE channel_code
END AS channel_tag,
COUNT(*) AS cust_cnt
FROM dwd_customer
GROUP BY channel_tag;
This shifts the conversation from "the result feels wrong" to "which specific class of values is affecting the output."
SQL Areas Most Affected by NULL
| SQL Type | Common Deviation | What to Check First |
|---|---|---|
| Filtering | Missing NULL rows | Are IS NULL / IS NOT NULL explicitly written? |
| Aggregation |
COUNT(col) too low |
Is a non‑null count mistaken for a total row count? |
| Conditional |
CASE WHEN col = NULL not working |
Is the comparison syntax correct? |
| Grouping | NULL and empty string conflated | Is the grouping semantics standardized first? |
Common Pitfalls
- Treating NULL as an empty string — especially common in text columns. They look the same but have different semantics.
- No agreed business semantics, so developers write their own defaults — one report treats NULL as "Unknown", another simply excludes it. Without alignment, results diverge.
- Ignoring NULLs introduced by LEFT JOIN — right‑side columns become NULL, and downstream filters or groupings unwittingly change the data scope.
- Fixing NULLs only at query time, while the raw layer remains messy — if a column persistently contains both NULLs and empty strings, upstream governance is already broken.
Recommendations
-
Make semantics explicit in SQL: Use
CASE WHENto map NULLs and empty strings to business labels instead of relying on mental assumptions. - Schedule regular NULL‑distribution checks for critical columns — treat it as routine inspection, not a last‑minute fire drill.
- Standardize key columns at the detail layer — don't let every downstream query invent its own NULL‑handling logic.
NULL rarely throws an error in a gbase database. It just quietly changes the answer. Separating NULLs, empty strings, and default values before you compare, aggregate, or group is usually the fastest way to close a result‑accuracy debate.
Top comments (0)