In this episode we will be discussing the impact of NULL values in Aggregation. Null value means that a column does not contain any value. It is not an empty string or 0, it is the absence of any value.
NULL
Null* values are excluded from calculations. Whenever we does aggregations such as SUM, the NULL values are not participants in these calculations.
This is not an issue when calculations are being performed on MIN, MAX and SUM, because when NULL is ignored is does not alter the calculation in anyway since it is the equivalent of 0 in these specific calculations.
This is an issue when calculations are being performed on Count and AVG, this is the reason for the episode; to demonstrate the impact NULL values have in aggregations such as Count and AVG.
Let’s see this in practice in SSMS.
I have created a simple table to better explain the impact of NULL values, since the size of most tables in AdventureWorks are large which in turn makes it difficult to immediately see the impact they have.
If we check to see the count of the OrderSales table, we will see it picks up all 4 of the values. But if we check to see the count of only the order value column, then we are receive only 3 in total. This means that the Count function only factored in NON-NULL values.
MIN will calculate the minimum value.
MAX will calculate the maximum value.
SUM will calculate the sum of all values.
However, AVG will not work as presumed, instead of dividing by four rows it will not accept NULL as valid and therefore not include it in its calculation. Unfortunately because of this we end up with invalid data.
Average is suppose to take the SUM of all the values which is 60 and divide them by 4, which is the total amount of rows. We should receive 15.
However, we receive back 20. Since the row with a NULL value was not included in the calculation, therefore we only divided the total of 60 by 3.
Keep this in mind when you are querying your database, especially when working with AVG, since this could lead to inaccurate data.
This can be especially troublesome in a large dataset such as AdventureWorks, this behaviour is not immediately apparent and can be easily missed or forgotten.
I hope this episode was helpful, I will try to keep this little bit of knowledge at the back of my mind whenever I am querying databases and I hope you do to. Till next time...
Top comments (0)