In any SQL statement, There are two types of columns.
- columns available via the FROM clause
- columns as SELECT list aliases
I learn something new about QUALIFY clause in BigQuery lately on how it could access these columns differently based on the position of window definition.
Please see example below.
WITH a AS (
    SELECT 
        *
    FROM UNNEST([STRUCT(1 AS val, 'in' AS canopy),
    (2, 'out'),
    (3, 'above'),
    (1, 'in')
    ])
)
-- Approach 1: QUALIFY only have access to columns available via the FROM clause
-- Case 1.1: 
-- SELECT
--     CASE WHEN canopy = 'above' THEN 'in'
--     ELSE canopy
--     END AS mapped_canopy,
--     val,
--     ROW_NUMBER() OVER w1 as rn,
-- FROM a
-- WHERE val IS NOT NULL
-- QUALIFY rn = 1
-- WINDOW w1 AS (PARTITION BY canopy)
-- Case 2: Invalid SQL
-- SELECT
--     CASE WHEN canopy = 'above' THEN 'in'
--     ELSE canopy
--     END AS mapped_canopy,
--     val,
--     ROW_NUMBER() OVER w1 as rn,
-- FROM a
-- WHERE val IS NOT NULL
-- QUALIFY rn = 1
-- WINDOW w1 AS (PARTITION BY mapped_canopy)
-- Approach 2: QUALIFY have access to both columns via the FROM clause and columns as SELECT list aliases
-- Case 3:
SELECT
    CASE WHEN canopy = 'above' THEN 'in'
    ELSE canopy
    END AS mapped_canopy,
    val
FROM a
WHERE val IS NOT NULL
QUALIFY ROW_NUMBER() OVER (PARTITION BY canopy) = 1
-- Case 4
-- SELECT
--     CASE WHEN canopy = 'above' THEN 'in'
--     ELSE canopy
--     END AS mapped_canopy,
--     val
-- FROM a
-- WHERE val IS NOT NULL
-- QUALIFY ROW_NUMBER() OVER (PARTITION BY mapped_canopy) = 1
 

 
    
Top comments (0)