Introduction
A common industrial flow of data analysis consists of :
- data preparation
- data aggregation
- data join
- data output
In this guide I will show how to implement efficiently data preparation, aggregation and output.
Feel free to check on Oracle XE Quick Start and Oracle SQL CROSS JOIN with UNPIVOT, SQL WITH clause.
Problem definition
Let's define the following problem : as the input we have sale revenue per day, and we wish to have a json as the output with the sale revenue percentage per day of week and the sale revenue percentage per day of week within the group : weekdays and weekends.
Parameters
In order to write professional SQL we should better define all the parameters within a separate block :
WITH
PARAMETERS AS (
SELECT
'SATURDAY' AS WEEKEND_V0,
'SUNDAY' AS WEEKEND_V1,
'WEEKEND' AS WEEKEND_LABEL,
'WORKWEEK' AS WORKWEEK_LABEL
FROM DUAL
)
SELECT * FROM PARAMETERS;
This is a simple example how to use with
. Let's extend the success and go further.
Test data
Now let's generate some test data : date and sale revenue for the date :
WITH
PARAMETERS AS (
SELECT
'SATURDAY' AS WEEKEND_V0,
'SUNDAY' AS WEEKEND_V1,
'WEEKEND' AS WEEKEND_LABEL,
'WORKWEEK' AS WORKWEEK_LABEL
FROM DUAL
),
DATA_IN AS (
SELECT
TO_TIMESTAMP('2024-01-01', 'YYYY-MM-DD') + (ROWNUM - 1) AS SALE_DATE,
SQRT(ROWNUM * 31 ) AS SALE_REVENUE
FROM DUAL
CONNECT BY LEVEL <= 7
)
SELECT * FROM DATA_IN;
Great! Now let's prepare our data for aggregation.
Data preparation
At this stage we need to be sure that we have all the necessary information for the data aggregation and that all the data types are the ones that we need. Feel free to check the guide on decimal data types which is applicable to our use case as well.
WITH
PARAMETERS AS (
SELECT
'SATURDAY' AS WEEKEND_V0,
'SUNDAY' AS WEEKEND_V1,
'WEEKEND' AS WEEKEND_LABEL,
'WORKWEEK' AS WORKWEEK_LABEL
FROM DUAL
),
DATA_IN AS (
SELECT
TO_TIMESTAMP('2024-01-01', 'YYYY-MM-DD') + (ROWNUM - 1) AS SALE_DATE,
SQRT(ROWNUM * 31 ) AS SALE_REVENUE
FROM DUAL
CONNECT BY LEVEL <= 7
),
DATA_PREP AS (
SELECT
DATA_IN_PREP.SALE_DATE,
DATA_IN_PREP.SALE_REVENUE,
DATA_IN_PREP.SALE_DOW,
(CASE WHEN (DATA_IN_PREP.SALE_DOW = PARAMETERS.WEEKEND_V0 OR DATA_IN_PREP.SALE_DOW = PARAMETERS.WEEKEND_V1)
THEN WEEKEND_LABEL ELSE WORKWEEK_LABEL END) AS SALE_TYPE
FROM (
SELECT
SALE_DATE,
CAST(SALE_REVENUE AS DECIMAL (31, 2)) AS SALE_REVENUE,
TRIM(TO_CHAR(SALE_DATE, 'DAY')) AS SALE_DOW
FROM DATA_IN) DATA_IN_PREP
CROSS JOIN PARAMETERS
)
SELECT * FROM DATA_PREP;
Good! Now we are set to start data aggregation.
Data aggregation
With properly prepared data the data aggregation is straight forward :
with
parameters as (
select
'SATURDAY' as weekend_v0,
'SUNDAY' as weekend_v1,
'WEEKEND' as weekend_label,
'WORKWEEK' as workweek_label
),
data_in as (
select
date_add(date('2024-01-01'), interval (value - 1) DAY) as sale_date,
sqrt(value) as sale_revenue
from unnest(generate_array(1, 7)) value
),
data_prep as (
select
data_in_prep.sale_date,
data_in_prep.sale_revenue,
data_in_prep.sale_dow,
(case when (data_in_prep.sale_dow = parameters.weekend_v0 or data_in_prep.sale_dow = parameters.weekend_v1)
then weekend_label else workweek_label end) as sale_type
from (
select
sale_date,
round(cast(sale_revenue as numeric),2) as sale_revenue,
trim(upper(format_date('%A',sale_date))) as sale_dow
from data_in
) data_in_prep cross join parameters
),
data_agg as (
select
sale_dow,
sale_revenue,
sale_type,
round(cast(100 * sale_revenue / (sum(sale_revenue) over (partition by sale_type))
as numeric), 2) as sale_type_revenue_perc
from data_prep
order by sale_revenue asc
)
select * from data_agg;
Now we are done with the aggregation and as we have a single data source we will skip join
and we may start with data output. Depending on the consumer we may have multiple data outputs by caching the results of data aggregation with a view creation.
Data output
Oracle has a native support for json output :
WITH
PARAMETERS AS (
SELECT
'SATURDAY' AS WEEKEND_V0,
'SUNDAY' AS WEEKEND_V1,
'WEEKEND' AS WEEKEND_LABEL,
'WORKWEEK' AS WORKWEEK_LABEL
FROM DUAL
),
DATA_IN AS (
SELECT
TO_TIMESTAMP('2024-01-01', 'YYYY-MM-DD') + (ROWNUM - 1) AS SALE_DATE,
SQRT(ROWNUM * 31 ) AS SALE_REVENUE
FROM DUAL
CONNECT BY LEVEL <= 7
),
DATA_PREP AS (
SELECT
DATA_IN_PREP.SALE_DATE,
DATA_IN_PREP.SALE_REVENUE,
DATA_IN_PREP.SALE_DOW,
(CASE WHEN (DATA_IN_PREP.SALE_DOW = PARAMETERS.WEEKEND_V0 OR DATA_IN_PREP.SALE_DOW = PARAMETERS.WEEKEND_V1)
THEN WEEKEND_LABEL ELSE WORKWEEK_LABEL END) AS SALE_TYPE
FROM (
SELECT
SALE_DATE,
CAST(SALE_REVENUE AS DECIMAL (31, 2)) AS SALE_REVENUE,
TRIM(TO_CHAR(SALE_DATE, 'DAY')) AS SALE_DOW
FROM DATA_IN) DATA_IN_PREP
CROSS JOIN PARAMETERS
),
DATA_AGG AS (
SELECT
SALE_DOW,
SALE_REVENUE,
SALE_TYPE,
CAST(100 * SALE_REVENUE / (SUM(SALE_REVENUE) OVER (PARTITION BY SALE_TYPE))
AS DECIMAL(31,2)) AS SALE_TYPE_REVENUE_PERC
FROM DATA_PREP
ORDER BY SALE_REVENUE ASC
),
DATA_OUTPUT AS (
SELECT JSON_OBJECT(*) AS JSON_OUTPUT FROM(
SELECT JSON_ARRAYAGG (
JSON_OBJECT (*) RETURNING CLOB
) AS SALE_DATA
FROM DATA_AGG
)
)
SELECT * FROM DATA_OUTPUT;
and here is the result:
{"SALE_DATA":[{"SALE_DOW":"TUESDAY","SALE_REVENUE":7.87,"SALE_TYPE":"WORKWEEK","SALE_TYPE_REVENUE_PERC":16.86},{"SALE_DOW":"WEDNESDAY","SALE_REVENUE":9.64,"SALE_TYPE":"WORKWEEK","SALE_TYPE_REVENUE_PERC":20.66},{"SALE_DOW":"THURSDAY","SALE_REVENUE":11.14,"SALE_TYPE":"WORKWEEK","SALE_TYPE_REVENUE_PERC":23.87},{"SALE_DOW":"FRIDAY","SALE_REVENUE":12.45,"SALE_TYPE":"WORKWEEK","SALE_TYPE_REVENUE_PERC":26.68},{"SALE_DOW":"SATURDAY","SALE_REVENUE":13.64,"SALE_TYPE":"WEEKEND","SALE_TYPE_REVENUE_PERC":48.08},{"SALE_DOW":"SUNDAY","SALE_REVENUE":14.73,"SALE_TYPE":"WEEKEND","SALE_TYPE_REVENUE_PERC":51.92},{"SALE_DOW":"MONDAY","SALE_REVENUE":5.57,"SALE_TYPE":"WORKWEEK","SALE_TYPE_REVENUE_PERC":11.93}]}
SQL is still good for the data analysis, it's immortal as Microsoft Excel. Feel free to add comments if some steps require additional details. Stay tuned !
Top comments (0)