In data analysis and business intelligence, it's crucial to aggregate data quickly and accurately. GBase provides advanced grouping query capabilities, including the GROUP BY ROLLUP
feature, to support multi-dimensional data aggregation and analysis. This article introduces the basic concepts, syntax structure, application examples, and practical use cases of GROUP BY ROLLUP
to help you understand and utilize this feature effectively.
1. Basic Functionality of GROUP BY ROLLUP
GROUP BY ROLLUP
is an advanced grouping query feature in GBase 8s that allows for grouping and aggregation based on specified columns, supporting different levels of summary calculations.
Basic Syntax:
SELECT column1, column2, ...
FROM table_name
GROUP BY ROLLUP(column1, column2, ...);
When used, the system first groups by the specified columns and then performs a UNION ALL
on the multiple result sets.
-
Number of Groups: Number of columns
n + 1
. -
Grouping Method: Start by grouping by all specified columns, then reduce one column from right to left sequentially, until no columns are used for grouping. Columns not involved in grouping will return
NULL
in the result set, and the results of these groups will be returned.
For example, if the ROLLUP
grouping columns are (A, B, C)
, it will first group by (A, B, C)
, then by (A, B)
, then by (A)
, and finally, by no columns at all. The query results are the UNION ALL
of each group’s result set.
Summary Calculation:
At each grouping level, aggregate functions such as SUM
, AVG
, MAX
, etc., can be used to calculate summary values. For instance, the SUM
function can calculate the total for each group.
Result Set:
The result set returned by GROUP BY ROLLUP
includes the union of all grouped data without removing duplicates. This means that statistics for each group are listed separately.
Auxiliary Functions:
-
GROUPING()
Function: Used to determine whether a specific group is a summary row, returning1
for summary rows and0
otherwise. -
GROUPING_ID()
Function: Identifies the grouping level, taking one or more columns as parameters and returning a binary value representing the grouping level of those columns.
2. Example Analysis
Suppose there is a sales data table sales
with fields year
(year), region
(region), and sales
(sales amount).
Note: Example database version: GBase8sV8.8_3.5.1
Creating the sales
table:
CREATE TABLE sales (
id INT PRIMARY KEY,
year INT,
region VARCHAR(50),
sales DECIMAL(10, 2)
);
Inserting data:
INSERT INTO sales (id, year, region, sales) VALUES (1, 2020, 'North China', 10000.00);
INSERT INTO sales (id, year, region, sales) VALUES (2, 2020, 'South China', 15000.00);
INSERT INTO sales (id, year, region, sales) VALUES (3, 2021, 'North China', 12000.00);
INSERT INTO sales (id, year, region, sales) VALUES (4, 2021, 'South China', 18000.00);
INSERT INTO sales (id, year, region, sales) VALUES (5, 2021, 'East China', 20000.00);
Example Query Using GROUP BY ROLLUP
:
SELECT year, region, SUM(sales) AS total_sales
FROM sales
GROUP BY ROLLUP(year, region);
Example Results:
YEAR 2021
REGION South China
TOTAL_SALES 18000.00
YEAR 2020
REGION South China
TOTAL_SALES 15000.00
YEAR 2021
REGION North China
TOTAL_SALES 12000.00
YEAR 2021
REGION East China
TOTAL_SALES 20000.00
YEAR 2020
REGION North China
TOTAL_SALES 10000.00
YEAR 2020
REGION
TOTAL_SALES 25000.00
YEAR 2021
REGION
TOTAL_SALES 50000.00
YEAR
REGION
TOTAL_SALES 75000.00
8 row(s) retrieved.
Query Results:
- Sales by year and region.
- Total sales by year (with
NULL
for the region). - Total sales for all years and regions (
NULL
for both year and region).
Using the GROUPING_ID()
Function:
SELECT
year,
region,
SUM(sales) AS total_sales,
DECODE(GROUPING_ID(year, region), 1, 'Yearly Total Sales', 0, 'Yearly Regional Subtotal', 3, 'Global Total Sales')
FROM
sales
GROUP BY
ROLLUP(year, region);
Example Results:
YEAR 2021
REGION South China
TOTAL_SALES 18000.00
(EXPRESSION) Yearly Regional Subtotal
YEAR 2020
REGION South China
TOTAL_SALES 15000.00
(EXPRESSION) Yearly Regional Subtotal
YEAR 2021
REGION North China
TOTAL_SALES 12000.00
(EXPRESSION) Yearly Regional Subtotal
YEAR 2021
REGION East China
TOTAL_SALES 20000.00
(EXPRESSION) Yearly Regional Subtotal
YEAR 2020
REGION North China
TOTAL_SALES 10000.00
(EXPRESSION) Yearly Regional Subtotal
YEAR 2020
REGION
TOTAL_SALES 25000.00
(EXPRESSION) Yearly Total Sales
YEAR 2021
REGION
TOTAL_SALES 50000.00
(EXPRESSION) Yearly Total Sales
YEAR
REGION
TOTAL_SALES 75000.00
(EXPRESSION) Global Total Sales
8 row(s) retrieved.
3. Use Cases
GROUP BY ROLLUP
is suitable for various data analysis scenarios, including but not limited to:
- Sales Analysis: Generate aggregated sales reports based on different sales regions, product categories, and time dimensions, helping companies understand sales performance and formulate strategies.
- Financial Reporting: Generate aggregated financial reports based on different accounts, departments, and time dimensions, assisting companies in financial analysis and decision-making.
- User Statistics: Create aggregated user reports based on different geographic locations, age groups, and genders, helping businesses understand user characteristics and behaviors.
As an advanced feature in GBase 8s, GROUP BY ROLLUP
enhances the flexibility and convenience of data aggregation. This detailed introduction and example demonstration aim to help you better grasp this feature, enabling you to achieve more in your data analysis tasks. Thank you for reading!
Top comments (0)