Sospeter Mong'are

Posted on Aug 1 • Edited on Aug 3

Count(*) vs Count(1) in SQL

#sql #database #beginners #tutorial

When working with SQL databases, efficient data retrieval and manipulation are crucial. One common operation is counting rows in a table, which can be done using the COUNT function. Two frequently used variations of this function are COUNT(*) and COUNT(1). Although they might seem identical at first glance, there are subtle differences and implications for their usage. This article will delve into these differences, providing a clear understanding of when and how to use each one.

Understanding the Basics

COUNT(*): This function counts all the rows in a table, including those with NULL values. It is the most straightforward way to get the total number of rows.

COUNT(1): This function counts all the rows where the value of the expression inside the parentheses is not NULL. Since 1 is a constant and never NULL, it effectively counts all the rows just like COUNT(*).

Key Differences

Syntax and Intent:
- COUNT(*): Counts every row in the table, regardless of what is in those rows.
- COUNT(1): Counts every row where the value of the expression 1 is not NULL. Since 1 is always not NULL, it counts all the rows.
Performance:
- Historically, some database systems might have optimized these queries differently. Modern database systems, however, treat COUNT(*) and COUNT(1) similarly in terms of performance. They both result in a full table scan if there is no WHERE clause.
- The performance difference is negligible in most modern RDBMS, including MySQL, PostgreSQL, and SQL Server. These systems have optimizers that understand both queries are essentially asking for the same thing: a row count.
Readability:
- COUNT(*) is often preferred for readability because it clearly indicates the intention of counting all rows. It's more intuitive to people reading the code.
- COUNT(1) might be used by some developers who are aware of its equivalence to COUNT(*) but prefer its appearance for personal or historical reasons.

Practical Considerations

Index Utilization:
- When counting rows in a large table, having an appropriate index can significantly speed up the query. Both COUNT(*) and COUNT(1) can benefit from indexes.
- Counting a specific column with COUNT(column_name) can be faster if the column is indexed and doesn't contain NULL values. This is because the database can scan the index instead of the entire table.
Counting Specific Conditions:
- If you need to count rows based on specific conditions, both COUNT(*) and COUNT(1) can be used with a WHERE clause. For example:
```
 SELECT COUNT(*) FROM employees WHERE department = 'Sales';
 SELECT COUNT(1) FROM employees WHERE department = 'Sales';
```

The choice between the two still boils down to personal preference and readability, as the performance remains comparable.

Compatibility:
- Both COUNT(*) and COUNT(1) are standard SQL and are supported by all major relational database systems. Choosing one over the other won't affect the portability of your SQL code.

Conclusion

In summary, COUNT(*) and COUNT(1) are functionally equivalent in most modern SQL databases, both providing the total row count of a table. While performance differences are minimal, COUNT(*) is generally preferred for its clarity and readability. When writing SQL queries, understanding these subtleties can help you write more readable and maintainable code, although it won't have a significant impact on performance in most cases.

For most scenarios, use COUNT(*) for counting rows unless there is a specific reason to prefer COUNT(1) based on your understanding of the database optimizer or personal preference. Always consider readability and maintainability of your code, as future developers will appreciate clear and understandable queries.

By choosing the appropriate COUNT function and leveraging indexes effectively, you can ensure your SQL queries are both efficient and easy to understand.

Top comments (17)

Sairam Mangeshkar • Aug 1

I feel like this adds nothing to the table. If the performance is similar and there's nothing different between the two why bother with COUNT(1) when the standard is COUNT(*)

I hope I'm not missing something.

Lepe • Aug 2

It's a ChatGPT article bruh

Jay H • Aug 2

This article will delve...

Yeah pretty confident it's generated. As soon as I see the word "delve" you can pretty much guarantee it's chatGPT.

kwood-gmr • Aug 2

The difference should begin with "once upon a time..."

At one point in history most database systems would pull the selection into memory during the count. At that time count(1) was much better because it didn't read the rows. But... this has been optimized for a very long time and there are now two conditions where it might matter (assuming the author is correct about * not including nulls):

You have a crotchity old timer who isn't learning.
You have a poorly designed table where all the columns are null.

Martin Baun • Aug 2

Yup, both do the exact same thing.

Abraham • Aug 2

While this is a ChatGPT article and stolen from asktom, it's interesting to note that it isn't actually correct, COUNT(1) is the sum of the value count that the COUNT() function will assign to every row in the table. The function will then count how many times the 1 or even an asterisk (*) or any number in count has been assigned. This number will be assigned a number of times that’s equal to the number of rows in the table. In other words, COUNT(1) assigns the value from the parentheses (number 1, in this case) to every row in the table, then the same function counts how many times the value in the parenthesis (1, in our case) has been assigned; naturally, this will always be equal to the number of rows in the table. The parentheses can contain any value; the only thing that won’t work will be leaving the parentheses empty.

Orix Au Yeung • Aug 2

Thanks ChatGPT!

kwood-gmr • Aug 2

This story could be shorter. It should begin with "once upon a time" and explain that some two decades ago it wasn't unusual to the dbms to read whatever was in the "count" operator as it was counting. So developers learned to save by using "count(1)".

Then someone had the good idea of not bothering to read the data and changed the code accordingly making it no longer matter.

But... in the world of technology we have habits that die hard, and some who don't keep up.

Luis Mesa • Aug 2 • Edited

Content stolen and reposted from asktom...

dstutzman • Aug 3

I'm very confident that this entire article was generated by ChatGPT. If asktom posted the same exact text, then they also posted content generated by ChatGPT.

Travis Peterson • Aug 2 • Edited

If COUNT(column_name) can be faster due to indexing, wouldn't the recommendation be to use COUNT(primary_key)?

Nikunj Bhatt • Aug 2

Yes, COUNT(primary_key) is recommended. However, if a column is simply indexed or uniquely indexed and you use like COUNT(unique_key_col) or COUNT(indexed_key_col) then one has to keep in mind that these will not count those rows whose value of unique_key_col or indexed_key_col columns are null.

Vincent • Aug 2

This is an absolutely true statement. I actually didn't get hired for a job because I answer the question.. Select (1). I even tried to show them with a demo and this was before the internet was capable of backing me up. Lol went on to start a successful software company on my own three times.

Luis Mesa • Aug 2

You "borrowed" this from asktom

Rob Erwin • Aug 2

It's my understanding that COUNT() won't count a row whose columns are all null. Rarely does this happen, but it would imply that the COUNT() does have to do a full table scan and look at every column value. I do like the idea of counting a NOT NULL indexed column though. Also check your database metadata, it might have a rowcount already calculated for you.

𒎏Wii 🏳️‍⚧️ • Aug 2

I sure hope most modern DBMSs are smart enough that, if they track the row-count of their tables, they will optimise count(*) to simply return this count; at least if they know this is always correct like if the table has any not null constraints.