DEV Community

Cover image for How to Transform SQL Queries to Crosstabs in PostgreSQL
Kagunda JM
Kagunda JM

Posted on • Originally published at dev.to

How to Transform SQL Queries to Crosstabs in PostgreSQL

A SELECT SQL query retrieves data from a database in a tabular form. The first row, or header row, has the column names, and all the other rows have the data that was retrieved.

Listing of salesmen monthly sales data

In a crosstab, the data is condensed and the names of one or more columns are rotated. You can add row and column totals to a crosstab. For example, a list of salesmen's monthly sales may include the months in the column headers. By rotating the data, it is easier to read and understand how the facts relate to one another.

Sample crosstab

Other names you might come across for crosstabs are matrix reports, pivot for SQL server databases, and pivot tables on spreadsheets like Microsoft Excel, Google Sheets, and LibreOffice.

The following topics are covered in this post:

This post uses queries from a database populated with data from the Northwind database for Postgres. The queries are run using DBeaver SQL client.

Crosstabs Using CASE Conditional Expression

The SQL CASE expression enables you to choose a value depending on a condition, much like an if-then-else conditional statement. The syntax for the CASE expression is as follows:



CASE WHEN condition THEN result
     [WHEN ...]
     [ELSE result]
END


Enter fullscreen mode Exit fullscreen mode

If a condition evaluates to true, the result value will be chosen, otherwise; the result value contained in the optional ELSE expression will be chosen.

In the example below, the SQL CASE statement is used to make a crosstab of monthly total sales by employees for the first four months of 1997.

--- CASE Conditional Expression crosstab using a CTE
WITH cte AS (
SELECT salesman,
SUM(CASE order_month WHEN 1 THEN order_total ELSE 0 END) AS Jan,
SUM(CASE order_month WHEN 2 THEN order_total ELSE 0 END) AS Feb,
SUM(CASE order_month WHEN 3 THEN order_total ELSE 0 END) AS Mar,
SUM(CASE order_month WHEN 4 THEN order_total ELSE 0 END) AS Apr,
SUM(order_total) AS total
FROM
(
SELECT e.last_name || ' ' || e.first_name AS salesman,
date_part('month', order_date) AS order_month,
ROUND((quantity * unit_price)::numeric, 2) AS order_total
FROM employees e
JOIN orders o ON e.employee_id = o.employee_id
JOIN order_details od ON o.order_id = od.order_id
WHERE date_part('year', order_date) = 1997
AND date_part('month', order_date) BETWEEN 1 AND 4
) employee_sales
GROUP by 1
)
SELECT * FROM cte
UNION ALL
SELECT 'Totals ',
SUM(jan),
SUM(feb),
SUM(mar),
SUM(apr),
SUM(total) AS total
FROM cte

If you run the query above, you will get a crosstab like the one below.

employees monthly total sales crosstab during the first four months of 1997

You can combine the previous query with a UNION ALL operator and the following SQL SELECT query to include the total for each month in the cross-tab.

--- UNION ALL operator to include totals for each month
UNION ALL
SELECT 'Totals ',
SUM(CASE order_month WHEN 1 THEN order_total ELSE 0 END) AS Jan,
SUM(CASE order_month WHEN 2 THEN order_total ELSE 0 END) AS Feb,
SUM(CASE order_month WHEN 3 THEN order_total ELSE 0 END) AS Mar,
SUM(CASE order_month WHEN 4 THEN order_total ELSE 0 END) AS Apr,
SUM(order_total) AS total
FROM
(
SELECT
date_part('month', order_date) AS order_month,
ROUND(SUM(quantity * unit_price)::numeric, 2) AS order_total
FROM
orders o
JOIN order_details od ON o.order_id = od.order_id
WHERE date_part('year', order_date) = 1997
AND date_part('month', order_date) BETWEEN 1 AND 4
GROUP BY 1
) employee_sales

employees monthly total sales crosstab during the first four months 1997 including month totals using SQL CASE expression

The SQL CASE expression is supported by most database systems.

Using Common Table Expressions (CTE) For A Crosstab

Common Table Expressions (CTE) in SQL let you create a temporary table in a SELECT SQL statement. It is then possible to reference this named temporary table in subsequent SELECT, INSERT, UPDATE, or DELETE SQL statements. The CTEs simplify the SQL queries by breaking them down into smaller, more manageable parts that are easier to read and understand. In PostgreSQL, CTEs are called WITH queries.

You can rewrite the previous CASE crosstab query using a CTE as follows:

--- CASE Conditional Expression crosstab using a CTE
WITH cte AS (
SELECT salesman,
SUM(CASE order_month WHEN 1 THEN order_total ELSE 0 END) AS Jan,
SUM(CASE order_month WHEN 2 THEN order_total ELSE 0 END) AS Feb,
SUM(CASE order_month WHEN 3 THEN order_total ELSE 0 END) AS Mar,
SUM(CASE order_month WHEN 4 THEN order_total ELSE 0 END) AS Apr,
SUM(order_total) AS total
FROM
(
SELECT e.last_name || ' ' || e.first_name AS salesman,
date_part('month', order_date) AS order_month,
ROUND((quantity * unit_price)::numeric, 2) AS order_total
FROM employees e
JOIN orders o ON e.employee_id = o.employee_id
JOIN order_details od ON o.order_id = od.order_id
WHERE date_part('year', order_date) = 1997
AND date_part('month', order_date) BETWEEN 1 AND 4
) employee_sales
GROUP by 1
)
SELECT * FROM cte
UNION ALL
SELECT 'Totals ',
SUM(jan),
SUM(feb),
SUM(mar),
SUM(apr),
SUM(total) AS total
FROM cte

Crosstabs Using Aggregate FILTER Clause

From PostgreSQL 9.4, you can use the FILTER clause to perform aggregate functions on specific records. The FILTER clause is less wordy and has a cleaner syntax than the CASE statement.

The following SQL query uses the FILTER clause to generate a crosstab.

---- Crosstabs Using Aggregate FILTER Clause
WITH cte As (
SELECT salesman ,
COALESCE(SUM(order_total) FILTER ( WHERE order_month = 1), 0) AS Jan,
COALESCE(SUM(order_total) FILTER ( WHERE order_month = 2), 0) AS Feb,
COALESCE(SUM(order_total) FILTER ( WHERE order_month = 3), 0) AS Mar,
COALESCE(SUM(order_total) FILTER ( WHERE order_month = 4), 0) AS Apr,
COALESCE(SUM(order_total), 0) AS total
FROM
(
SELECT e.last_name || ' ' || e.first_name AS salesman ,
date_part('month', order_date) AS order_month,
ROUND((quantity * unit_price)::numeric, 2) AS order_total
FROM
employees e
JOIN orders o ON e.employee_id = o.employee_id
JOIN order_details od ON o.order_id = od.order_id
WHERE date_part('year', order_date) = 1997
AND date_part('month', order_date) BETWEEN 1 AND 4
) employee_sales
GROUP by 1
)
SELECT * FROM cte
UNION ALL
SELECT 'Totals ',
SUM(jan),
SUM(feb),
SUM(mar),
SUM(apr),
SUM(total) AS total
FROM cte

The SQL COALESCE function replaces any null values in the crosstab with zero (0) values.

employees monthly total sales crosstab during the first four months of 1997 including month totals using aggregate FILTER clause

Using PostgreSQL crosstab() Function

The crosstab() function is part of the optional tablefunc module.

You can run the SELECT COUNT(*) FROM pg_extension WHERE extname='tablefunc'; query to see if the tablefunc extension is installed on the database you are using. If the result of the query is 0, install and activate the tablefunc⁣ extension using the CREATE EXTENSION IF NOT EXIST tablefunc;⁣ SQL command. The tablefunc⁣ module can be installed by non-superusers who have the CREATE⁣ privilege.

The crosstab function has several options, crosstab(sql text), crosstabN(sql text), and crosstab(source_sql text, category_sql text). Examples in this post use the crosstab(source_sql text, category_sql text) function option.

The first parameter (source_sql text) in a crosstab(source_sql text, category_sql text) function is the source SQL SELECT query statement and must return at least three (3) columns of data to pivot or rotate. The first column (row_name) contains data values to be used as row identifiers in the final result; data in the second column (category) represents category values that will be rotated to column headers in the pivot table, and the third column (value) contains data to be assigned to each cell of the final crosstab. The second parameter text category_sql is a query returning a category list for the columns.

The crosstab function returns a set of records with unknown data types. Therefore, you must alias the returned columns with column names and types using the AS (col1 type, col2 type, ...) clause. Failing to alias the columns will cause a column definition list is required for functions returning "record" error.

The SQL query below shows how to use the crosstab(source_sql text, category_sql text) function. A dollar-quoted string constant ($$) has been used to maintain formatting consistency and remove the need to escape single quotes (') by doubling them.

-- Using PostgreSQL crosstab() Function
SELECT *
FROM crosstab(
$$
SELECT e.last_name || ' ' || e.first_name salesman,
date_part('Mon',order_date) AS order_month,
SUM(ROUND((quantity * unit_price)::numeric, 2)) AS order_total
FROM employees e
JOIN orders o ON e.employee_id = o.employee_id
JOIN order_details od ON o.order_id = od.order_id
WHERE date_part('year', order_date) = 1997
AND date_part('month', order_date) BETWEEN 1 AND 4
GROUP BY 1, 2
UNION ALL
SELECT 'month_total', date_part('month', order_date) AS order_month,
ROUND(SUM(quantity * unit_price)::numeric, 2) AS order_total
FROM orders o
JOIN order_details od ON o.order_id = od.order_id
WHERE date_part('year', order_date) = 1997
AND date_part('month', order_date) BETWEEN 1 AND 4
GROUP BY 1,2
$$,
$$
SELECT DISTINCT date_part('month', order_date) AS order_month
FROM orders
WHERE date_part('year', order_date) = 1997
AND date_part('month', order_date) BETWEEN 1 AND 4
ORDER BY 1
$$
) AS ct (salesman text, Jan numeric, Feb numeric, Mar numeric, Apr numeric);

employees monthly total sales crosstab during the first four months of 1997 using the crosstab() function

Using A PostgreSQL crosstab() Function With More Than Three Columns

If your source_sql text SQL query returns more than three columns, the additional or extra columns must be placed between the row_name and category columns. In the previous crosstab SQL query, any extra or additional columns must be placed between the salesman and order_month columns.

PostgreSQL converts identifiers/column names to lowercase by default. To capitalize the crosstab column headers, enclose them within double quotes.

-- Using A PostgreSQL crosstab() Function With More Than Three Columns
SELECT employee AS "Salesman", title AS "Title", emp_total AS "Total", COALESCE(Jan, 0) AS "Jan",
COALESCE(Feb, 0) AS "Feb", mar AS "Mar", COALESCE(Apr, 0) AS "Apr"
FROM crosstab(
$$
SELECT e.last_name || ' ' || e.first_name as salesman,
e.title ,
et.emp_total ,
date_part('Mon',order_date) AS order_month,
SUM(ROUND((quantity * unit_price)::numeric, 2)) AS order_total
FROM employees e
JOIN (
SELECT e.employee_id ,
SUM(ROUND((quantity * unit_price)::numeric, 2)) AS emp_total
FROM employees e
JOIN orders o ON e.employee_id = o.employee_id
JOIN order_details od ON o.order_id = od.order_id
WHERE date_part('year', order_date) = 1997
AND date_part('month', order_date) BETWEEN 1 AND 4
GROUP BY 1
) et on e.employee_id = et.employee_id
JOIN orders o ON e.employee_id = o.employee_id
JOIN order_details od ON o.order_id = od.order_id
WHERE date_part('year', order_date) = 1997
AND date_part('month', order_date) BETWEEN 1 AND 4
GROUP BY 1,2,3,4
UNION ALL
SELECT '' AS salesman,
'Month Total' AS title,
(
SELECT SUM(ROUND((quantity * unit_price)::numeric, 2)) AS order_total
FROM orders o
JOIN order_details od ON o.order_id = od.order_id
WHERE date_part('year', order_date) = 1997
AND date_part('month', order_date) BETWEEN 1 AND 4
) AS col_total,
date_part('Mon',order_date) AS order_month,
SUM(ROUND((quantity * unit_price)::numeric, 2)) AS order_total
FROM employees e
JOIN orders o ON e.employee_id = o.employee_id
JOIN order_details od ON o.order_id = od.order_id
WHERE date_part('year', order_date) = 1997
AND date_part('month', order_date) BETWEEN 1 AND 4
GROUP BY 1,2,3, 4
$$,
$$
SELECT DISTINCT date_part('month', order_date) AS order_month
FROM orders
WHERE date_part('year', order_date) = 1997
AND date_part('month', order_date) BETWEEN 1 AND 4
ORDER BY 1
$$
) AS ct (employee text, title text, emp_total numeric, Jan numeric, Feb numeric, Mar numeric, Apr numeric);

employees monthly total sales crosstab during the first four months of 1997 using crosstab() function with more than 3 columns

Using ARRAY Data Type To Re-Arrange Extra Crosstab Columns

A crosstab with more than three columns has limitations in the order placement of the additional columns in the final crosstab. However, you can re-arrange the crosstab columns by inserting the extra columns into an arrays data type.

-- Using ARRAY Data Type To Re-Arrange Extra Crosstab Columns
SELECT employee[1] AS "Salesman", employee[2] AS "Title",
COALESCE(Jan, 0) AS "Jan", COALESCE(Feb, 0) AS "Feb", mar AS "Mar", COALESCE(Apr, 0) AS "Apr"
, employee[3]::numeric AS "Total"
FROM crosstab(
$$
SELECT ARRAY[e.last_name || ' ' || e.first_name, e.title, et.emp_total::text] AS employee,
date_part('Mon',order_date) AS order_month,
SUM(ROUND((quantity * unit_price)::numeric, 2)) AS order_total
FROM employees e
JOIN (
SELECT e.employee_id ,
SUM(ROUND((quantity * unit_price)::numeric, 2)) AS emp_total
FROM employees e
JOIN orders o ON e.employee_id = o.employee_id
JOIN order_details od ON o.order_id = od.order_id
WHERE date_part('year', order_date) = 1997
AND date_part('month', order_date) BETWEEN 1 AND 4
GROUP BY 1
) et on e.employee_id = et.employee_id
JOIN orders o ON e.employee_id = o.employee_id
JOIN order_details od ON o.order_id = od.order_id
WHERE date_part('year', order_date) = 1997
AND date_part('month', order_date) BETWEEN 1 AND 4
GROUP BY 1,2
UNION ALL
SELECT ARRAY[
'', 'Month Total',
(
SELECT SUM(ROUND((quantity * unit_price)::numeric, 2))::text
FROM orders o
JOIN order_details od ON o.order_id = od.order_id
WHERE date_part('year', order_date) = 1997
AND date_part('month', order_date) BETWEEN 1 AND 4
)
] AS employee,
date_part('Mon',order_date) AS order_month,
SUM(ROUND((quantity * unit_price)::numeric, 2)) AS order_total
FROM employees e
JOIN orders o ON e.employee_id = o.employee_id
JOIN order_details od ON o.order_id = od.order_id
WHERE date_part('year', order_date) = 1997
AND date_part('month', order_date) BETWEEN 1 AND 4
GROUP BY 1,2
$$,
$$
SELECT DISTINCT date_part('month', order_date) AS order_month
FROM orders
WHERE date_part('year', order_date) = 1997
AND date_part('month', order_date) BETWEEN 1 AND 4
ORDER BY 1
$$
) AS ct (employee text[], Jan numeric, Feb numeric, Mar numeric, Apr numeric);

employees monthly total sales crosstab during the first four months 1997 using crosstab() function with re-arranged columns

Conclusion

Crosstabs are a powerful way to summarize and analyze data from a database. By presenting data in a condensed and organized format, crosstabs make it easier to analyze relationships between different variables.

The CASE conditional expression, aggregate FILTER clause, and the PostgreSQL crosstab() function are some of the methods for creating crosstabs. If the crosstab() function uses more than three columns to summarize data, the ARRAY data type may be used to re-arrange the extra columns into the correct positions. Using crosstabs, PostgreSQL users can create reports and dashboards that help them make informed business decisions.

Resources

  1. CrossTab Queries in PostgreSQL using tablefunc contrib
  2. Static and dynamic pivots

Top comments (0)