When working with SQL databases, efficient data retrieval and manipulation are crucial. One common operation is counting rows in a table, which can be done using the COUNT
function. Two frequently used variations of this function are COUNT(*)
and COUNT(1)
. Although they might seem identical at first glance, there are subtle differences and implications for their usage. This article will delve into these differences, providing a clear understanding of when and how to use each one.
Understanding the Basics
COUNT(*): This function counts all the rows in a table, including those with NULL
values. It is the most straightforward way to get the total number of rows.
COUNT(1): This function counts all the rows where the value of the expression inside the parentheses is not NULL
. Since 1
is a constant and never NULL
, it effectively counts all the rows just like COUNT(*)
.
Key Differences
-
Syntax and Intent:
-
COUNT(*)
: Counts every row in the table, regardless of what is in those rows. -
COUNT(1)
: Counts every row where the value of the expression1
is notNULL
. Since1
is always notNULL
, it counts all the rows.
-
-
Performance:
- Historically, some database systems might have optimized these queries differently. Modern database systems, however, treat
COUNT(*)
andCOUNT(1)
similarly in terms of performance. They both result in a full table scan if there is noWHERE
clause. - The performance difference is negligible in most modern RDBMS, including MySQL, PostgreSQL, and SQL Server. These systems have optimizers that understand both queries are essentially asking for the same thing: a row count.
- Historically, some database systems might have optimized these queries differently. Modern database systems, however, treat
-
Readability:
-
COUNT(*)
is often preferred for readability because it clearly indicates the intention of counting all rows. It's more intuitive to people reading the code. -
COUNT(1)
might be used by some developers who are aware of its equivalence toCOUNT(*)
but prefer its appearance for personal or historical reasons.
-
Practical Considerations
-
Index Utilization:
- When counting rows in a large table, having an appropriate index can significantly speed up the query. Both
COUNT(*)
andCOUNT(1)
can benefit from indexes. - Counting a specific column with
COUNT(column_name)
can be faster if the column is indexed and doesn't containNULL
values. This is because the database can scan the index instead of the entire table.
- When counting rows in a large table, having an appropriate index can significantly speed up the query. Both
-
Counting Specific Conditions:
- If you need to count rows based on specific conditions, both
COUNT(*)
andCOUNT(1)
can be used with aWHERE
clause. For example:
SELECT COUNT(*) FROM employees WHERE department = 'Sales'; SELECT COUNT(1) FROM employees WHERE department = 'Sales';
- If you need to count rows based on specific conditions, both
- The choice between the two still boils down to personal preference and readability, as the performance remains comparable.
-
Compatibility:
- Both
COUNT(*)
andCOUNT(1)
are standard SQL and are supported by all major relational database systems. Choosing one over the other won't affect the portability of your SQL code.
- Both
Conclusion
In summary, COUNT(*)
and COUNT(1)
are functionally equivalent in most modern SQL databases, both providing the total row count of a table. While performance differences are minimal, COUNT(*)
is generally preferred for its clarity and readability. When writing SQL queries, understanding these subtleties can help you write more readable and maintainable code, although it won't have a significant impact on performance in most cases.
For most scenarios, use COUNT(*)
for counting rows unless there is a specific reason to prefer COUNT(1)
based on your understanding of the database optimizer or personal preference. Always consider readability and maintainability of your code, as future developers will appreciate clear and understandable queries.
By choosing the appropriate COUNT
function and leveraging indexes effectively, you can ensure your SQL queries are both efficient and easy to understand.
Top comments (17)
I feel like this adds nothing to the table. If the performance is similar and there's nothing different between the two why bother with COUNT(1) when the standard is COUNT(*)
I hope I'm not missing something.
It's a ChatGPT article bruh
Yeah pretty confident it's generated. As soon as I see the word "delve" you can pretty much guarantee it's chatGPT.
The difference should begin with "once upon a time..."
At one point in history most database systems would pull the selection into memory during the count. At that time count(1) was much better because it didn't read the rows. But... this has been optimized for a very long time and there are now two conditions where it might matter (assuming the author is correct about * not including nulls):
Yup, both do the exact same thing.
While this is a ChatGPT article and stolen from asktom, it's interesting to note that it isn't actually correct, COUNT(1) is the sum of the value count that the COUNT() function will assign to every row in the table. The function will then count how many times the 1 or even an asterisk (*) or any number in count has been assigned. This number will be assigned a number of times that’s equal to the number of rows in the table. In other words, COUNT(1) assigns the value from the parentheses (number 1, in this case) to every row in the table, then the same function counts how many times the value in the parenthesis (1, in our case) has been assigned; naturally, this will always be equal to the number of rows in the table. The parentheses can contain any value; the only thing that won’t work will be leaving the parentheses empty.
Thanks ChatGPT!
This story could be shorter. It should begin with "once upon a time" and explain that some two decades ago it wasn't unusual to the dbms to read whatever was in the "count" operator as it was counting. So developers learned to save by using "count(1)".
Then someone had the good idea of not bothering to read the data and changed the code accordingly making it no longer matter.
But... in the world of technology we have habits that die hard, and some who don't keep up.
Content stolen and reposted from asktom...
I'm very confident that this entire article was generated by ChatGPT. If asktom posted the same exact text, then they also posted content generated by ChatGPT.
If COUNT(column_name) can be faster due to indexing, wouldn't the recommendation be to use COUNT(primary_key)?
Yes,
COUNT(primary_key)
is recommended. However, if a column is simply indexed or uniquely indexed and you use likeCOUNT(unique_key_col)
orCOUNT(indexed_key_col)
then one has to keep in mind that these will not count those rows whose value ofunique_key_col
orindexed_key_col
columns are null.This is an absolutely true statement. I actually didn't get hired for a job because I answer the question.. Select (1). I even tried to show them with a demo and this was before the internet was capable of backing me up. Lol went on to start a successful software company on my own three times.
You "borrowed" this from asktom
It's my understanding that COUNT() won't count a row whose columns are all null. Rarely does this happen, but it would imply that the COUNT() does have to do a full table scan and look at every column value. I do like the idea of counting a NOT NULL indexed column though. Also check your database metadata, it might have a rowcount already calculated for you.
I sure hope most modern DBMSs are smart enough that, if they track the row-count of their tables, they will optimise
count(*)
to simply return this count; at least if they know this is always correct like if the table has any not null constraints.I totally agree with all comments below. Useless