In any SQL statement, There are two types of columns.
- columns available via the FROM clause
- columns as SELECT list aliases
I learn something new about QUALIFY clause in BigQuery lately on how it could access these columns differently based on the position of window definition.
Please see example below.
WITH a AS (
SELECT
*
FROM UNNEST([STRUCT(1 AS val, 'in' AS canopy),
(2, 'out'),
(3, 'above'),
(1, 'in')
])
)
-- Approach 1: QUALIFY only have access to columns available via the FROM clause
-- Case 1.1:
-- SELECT
-- CASE WHEN canopy = 'above' THEN 'in'
-- ELSE canopy
-- END AS mapped_canopy,
-- val,
-- ROW_NUMBER() OVER w1 as rn,
-- FROM a
-- WHERE val IS NOT NULL
-- QUALIFY rn = 1
-- WINDOW w1 AS (PARTITION BY canopy)
-- Case 2: Invalid SQL
-- SELECT
-- CASE WHEN canopy = 'above' THEN 'in'
-- ELSE canopy
-- END AS mapped_canopy,
-- val,
-- ROW_NUMBER() OVER w1 as rn,
-- FROM a
-- WHERE val IS NOT NULL
-- QUALIFY rn = 1
-- WINDOW w1 AS (PARTITION BY mapped_canopy)
-- Approach 2: QUALIFY have access to both columns via the FROM clause and columns as SELECT list aliases
-- Case 3:
SELECT
CASE WHEN canopy = 'above' THEN 'in'
ELSE canopy
END AS mapped_canopy,
val
FROM a
WHERE val IS NOT NULL
QUALIFY ROW_NUMBER() OVER (PARTITION BY canopy) = 1
-- Case 4
-- SELECT
-- CASE WHEN canopy = 'above' THEN 'in'
-- ELSE canopy
-- END AS mapped_canopy,
-- val
-- FROM a
-- WHERE val IS NOT NULL
-- QUALIFY ROW_NUMBER() OVER (PARTITION BY mapped_canopy) = 1
Top comments (0)