count(*) vs count(1) vs count(column_name) in GBase 8c

#gbase #database #数据库

In GBase 8c, the China‑domestically developed database from GBASE, count is one of the most frequently used functions. Developers often mix count(*), count(1), and count(column_name) without fully understanding the differences. This article breaks down their similarities and critical distinctions with concrete examples.

What They Have in Common

Purpose: All three aggregate and return the number of rows as bigint.
Usage: They all work with WHERE, GROUP BY, and HAVING clauses.
Result without NULLs: When no NULLs exist in the specified columns, all three return identical results.

Test table:

CREATE TABLE student(id INT PRIMARY KEY, name VARCHAR(50), age INT);
INSERT INTO student VALUES(1, 'aaa', 20), (2, 'bbb', 21), (3, 'ccc', 22);

SELECT count(*) FROM student;   -- 3
SELECT count(1) FROM student;   -- 3
SELECT count(id) FROM student;  -- 3

Key Differences

1. NULL Handling

count(*) and count(1): Count row existence, unaffected by NULLs in any column.
count(column_name): Counts only non‑NULL values in the specified column; NULLs are silently ignored.

Insert a row with NULLs:

INSERT INTO student VALUES(4, NULL, NULL);

SELECT count(*) FROM student;   -- 4
SELECT count(1) FROM student;   -- 4
SELECT count(name) FROM student;-- 3
SELECT count(age) FROM student; -- 3
SELECT count(id) FROM student;  -- 4

2. Semantics and Readability

count(*): The SQL standard way — "count all rows". Clear and universal.
count(1): An extension — "count if the record exists" via the constant 1. Less readable.
count(column_name): Means "count non‑NULL values in this column", not simply "count rows".

3. Performance

count(*) and count(1): GBase 8c's optimizer transforms both into the same execution plan. No full table scan is required; both are equally fast.
count(column_name): Performance depends on indexing. With an index, it's efficient. Without an index, a full table scan plus row‑by‑row NULL checks is needed — the larger the table, the bigger the gap.

Recommendations

To count rows: Use count(*). It's clear, standard, and optimized in a gbase database.
To count non‑NULL values: Use count(column_name) and ensure the column is indexed.
count(1) works as a substitute, but it's not the best choice when readability matters.

Understanding these distinctions helps you write more precise and performant queries on GBASE's GBase 8c, avoiding subtle statistical errors caused by NULLs.

DEV Community