DEV Community

hexfloor
hexfloor

Posted on

SQL Pro Tips : industrial Oracle SQL using WITH

Introduction

A common industrial flow of data analysis consists of :

  • data preparation
  • data aggregation
  • data join
  • data output

In this guide I will show how to implement efficiently data preparation, aggregation and output.
Feel free to check on Oracle XE Quick Start and Oracle SQL CROSS JOIN with UNPIVOT, SQL WITH clause.

Problem definition

Let's define the following problem : as the input we have sale revenue per day, and we wish to have a json as the output with the sale revenue percentage per day of week and the sale revenue percentage per day of week within the group : weekdays and weekends.

Parameters

In order to write professional SQL we should better define all the parameters within a separate block :

WITH
PARAMETERS AS (
    SELECT 
        'SATURDAY' AS WEEKEND_V0,
        'SUNDAY' AS WEEKEND_V1,
        'WEEKEND' AS WEEKEND_LABEL,
        'WORKWEEK' AS WORKWEEK_LABEL
    FROM DUAL
)
SELECT * FROM PARAMETERS;
Enter fullscreen mode Exit fullscreen mode

Image description
This is a simple example how to use with. Let's extend the success and go further.

Test data

Now let's generate some test data : date and sale revenue for the date :

WITH
PARAMETERS AS (
    SELECT 
        'SATURDAY' AS WEEKEND_V0,
        'SUNDAY' AS WEEKEND_V1,
        'WEEKEND' AS WEEKEND_LABEL,
        'WORKWEEK' AS WORKWEEK_LABEL
    FROM DUAL
),
DATA_IN AS (
    SELECT
        TO_TIMESTAMP('2024-01-01', 'YYYY-MM-DD') + (ROWNUM - 1) AS SALE_DATE,
        SQRT(ROWNUM * 31 ) AS SALE_REVENUE
    FROM DUAL
    CONNECT BY LEVEL <= 7
)
SELECT * FROM DATA_IN;
Enter fullscreen mode Exit fullscreen mode

Image description
Great! Now let's prepare our data for aggregation.

Data preparation

At this stage we need to be sure that we have all the necessary information for the data aggregation and that all the data types are the ones that we need. Feel free to check the guide on decimal data types which is applicable to our use case as well.

WITH
PARAMETERS AS (
    SELECT 
        'SATURDAY' AS WEEKEND_V0,
        'SUNDAY' AS WEEKEND_V1,
        'WEEKEND' AS WEEKEND_LABEL,
        'WORKWEEK' AS WORKWEEK_LABEL
    FROM DUAL
),
DATA_IN AS (
    SELECT
        TO_TIMESTAMP('2024-01-01', 'YYYY-MM-DD') + (ROWNUM - 1) AS SALE_DATE,
        SQRT(ROWNUM * 31 ) AS SALE_REVENUE
    FROM DUAL
    CONNECT BY LEVEL <= 7
),
DATA_PREP AS (
    SELECT 
        DATA_IN_PREP.SALE_DATE,
        DATA_IN_PREP.SALE_REVENUE,
        DATA_IN_PREP.SALE_DOW,
        (CASE WHEN (DATA_IN_PREP.SALE_DOW = PARAMETERS.WEEKEND_V0 OR DATA_IN_PREP.SALE_DOW = PARAMETERS.WEEKEND_V1) 
            THEN WEEKEND_LABEL ELSE WORKWEEK_LABEL END) AS SALE_TYPE        
    FROM (
        SELECT
            SALE_DATE,
            CAST(SALE_REVENUE AS DECIMAL (31, 2)) AS SALE_REVENUE,
            TRIM(TO_CHAR(SALE_DATE, 'DAY')) AS SALE_DOW
        FROM DATA_IN) DATA_IN_PREP
    CROSS JOIN PARAMETERS
)
SELECT * FROM DATA_PREP;
Enter fullscreen mode Exit fullscreen mode

Image description
Good! Now we are set to start data aggregation.

Data aggregation

With properly prepared data the data aggregation is straight forward :

with 
parameters as (
    select 
        'SATURDAY' as weekend_v0,
        'SUNDAY' as weekend_v1,
        'WEEKEND' as weekend_label,
        'WORKWEEK' as workweek_label
),
data_in as (
  select 
    date_add(date('2024-01-01'), interval (value - 1) DAY) as sale_date,
    sqrt(value) as sale_revenue
  from unnest(generate_array(1, 7)) value
),
data_prep as (
  select 
    data_in_prep.sale_date,
    data_in_prep.sale_revenue,
    data_in_prep.sale_dow,
    (case when (data_in_prep.sale_dow = parameters.weekend_v0 or data_in_prep.sale_dow = parameters.weekend_v1) 
            then weekend_label else workweek_label end) as sale_type
  from (
        select
            sale_date,
            round(cast(sale_revenue as numeric),2) as sale_revenue,
            trim(upper(format_date('%A',sale_date))) as sale_dow
        from data_in
  ) data_in_prep cross join parameters
),
data_agg as (
    select
        sale_dow,
        sale_revenue,
        sale_type,
        round(cast(100 * sale_revenue / (sum(sale_revenue) over (partition by sale_type)) 
            as numeric), 2) as sale_type_revenue_perc
    from data_prep
    order by sale_revenue asc

)
select * from data_agg;
Enter fullscreen mode Exit fullscreen mode

Now we are done with the aggregation and as we have a single data source we will skip join and we may start with data output. Depending on the consumer we may have multiple data outputs by caching the results of data aggregation with a view creation.

Data output

Oracle has a native support for json output :

WITH
PARAMETERS AS (
    SELECT 
        'SATURDAY' AS WEEKEND_V0,
        'SUNDAY' AS WEEKEND_V1,
        'WEEKEND' AS WEEKEND_LABEL,
        'WORKWEEK' AS WORKWEEK_LABEL
    FROM DUAL
),
DATA_IN AS (
    SELECT
        TO_TIMESTAMP('2024-01-01', 'YYYY-MM-DD') + (ROWNUM - 1) AS SALE_DATE,
        SQRT(ROWNUM * 31 ) AS SALE_REVENUE
    FROM DUAL
    CONNECT BY LEVEL <= 7
),
DATA_PREP AS (
    SELECT 
        DATA_IN_PREP.SALE_DATE,
        DATA_IN_PREP.SALE_REVENUE,
        DATA_IN_PREP.SALE_DOW,
        (CASE WHEN (DATA_IN_PREP.SALE_DOW = PARAMETERS.WEEKEND_V0 OR DATA_IN_PREP.SALE_DOW = PARAMETERS.WEEKEND_V1) 
            THEN WEEKEND_LABEL ELSE WORKWEEK_LABEL END) AS SALE_TYPE        
    FROM (
        SELECT
            SALE_DATE,
            CAST(SALE_REVENUE AS DECIMAL (31, 2)) AS SALE_REVENUE,
            TRIM(TO_CHAR(SALE_DATE, 'DAY')) AS SALE_DOW
        FROM DATA_IN) DATA_IN_PREP
    CROSS JOIN PARAMETERS
),
DATA_AGG AS (
    SELECT
        SALE_DOW,
        SALE_REVENUE,
        SALE_TYPE,
        CAST(100 * SALE_REVENUE / (SUM(SALE_REVENUE) OVER (PARTITION BY SALE_TYPE)) 
            AS DECIMAL(31,2)) AS SALE_TYPE_REVENUE_PERC
    FROM DATA_PREP
    ORDER BY SALE_REVENUE ASC

),
DATA_OUTPUT AS (
    SELECT JSON_OBJECT(*) AS JSON_OUTPUT FROM(
        SELECT  JSON_ARRAYAGG (  
            JSON_OBJECT (*) RETURNING CLOB   
        ) AS SALE_DATA
        FROM DATA_AGG
    )
)
SELECT * FROM DATA_OUTPUT;

Enter fullscreen mode Exit fullscreen mode

and here is the result:

{"SALE_DATA":[{"SALE_DOW":"TUESDAY","SALE_REVENUE":7.87,"SALE_TYPE":"WORKWEEK","SALE_TYPE_REVENUE_PERC":16.86},{"SALE_DOW":"WEDNESDAY","SALE_REVENUE":9.64,"SALE_TYPE":"WORKWEEK","SALE_TYPE_REVENUE_PERC":20.66},{"SALE_DOW":"THURSDAY","SALE_REVENUE":11.14,"SALE_TYPE":"WORKWEEK","SALE_TYPE_REVENUE_PERC":23.87},{"SALE_DOW":"FRIDAY","SALE_REVENUE":12.45,"SALE_TYPE":"WORKWEEK","SALE_TYPE_REVENUE_PERC":26.68},{"SALE_DOW":"SATURDAY","SALE_REVENUE":13.64,"SALE_TYPE":"WEEKEND","SALE_TYPE_REVENUE_PERC":48.08},{"SALE_DOW":"SUNDAY","SALE_REVENUE":14.73,"SALE_TYPE":"WEEKEND","SALE_TYPE_REVENUE_PERC":51.92},{"SALE_DOW":"MONDAY","SALE_REVENUE":5.57,"SALE_TYPE":"WORKWEEK","SALE_TYPE_REVENUE_PERC":11.93}]}
Enter fullscreen mode Exit fullscreen mode

SQL is still good for the data analysis, it's immortal as Microsoft Excel. Feel free to add comments if some steps require additional details. Stay tuned !

Top comments (0)