SQL: Summing categories

#database #dataengineering #csv #programming

As we continue to compare SQL and CsvPath Validation Language, let's look a simple summation case statement. This isn't hard SQL, at all. And it's an example where SQL and CsvPath are roughly equally verbose, which is often not the case, since, within its remit, CsvPath Validation Language is quite concise.

The value of this construction in a CSV or Excel validation scenario could be from adding a rule to this csvpath. Potentially, the summation here could create variables for multiple rules in other csvpaths to use. And, of course, this information could simply be printed as part of a validation report.

You can run this example from any FlightPath Data project's examples folder. FlightPath Data is a free and open source download on the macOS and Windows Stores or Github. FlightPath info and links are here.

~
This csvpath sums up two header values in a similar way to this SQL:
SELECT
    SUM (CASE
       WHEN neighborhood IN ('Roxbury','South Boston')
       THEN worker_hours_this_period
       ELSE 0 END) AS roxbury_south_boston_hours,
    SUM (CASE
       WHEN trade IN ('Iron Worker','Pipefitter')
       THEN worker_hours_this_period
       ELSE 0 END) AS iron_worker_pipefitter_hours
FROM projects

id: hello world
test-data: examples/counting/projects.csv
~
$[1*][
    ~ capture the raw numbers, skipping the header row ~
    subtotal.neighborhood(#neighborhood, #worker_hours_this_period)
    subtotal.trade(#trade, #worker_hours_this_period)

    ~ aggregate the categories ~
    @trade_hours = add(
       get("neighborhood", "Roxbury"),
       get("neighborhood", "South Boston")
    )
    @neighborhood_hours = add(
       get("trade", "Iron Worker"),
       get("trade", "Pipefitter")
    )

    ~
    on the last line, replace two headers of CSV data with the
     variable data generated by this csvpath and collect those
     headers to our results
    ~
    last() -> replace( #neighborhood, @neighborhood_hours )
    last() -> replace( #trade, @trade_hours )
    last() -> collect( "neighborhood", "trade" )
]

DEV Community

SQL: Summing categories

Top comments (0)