DEV Community

Judy
Judy

Posted on

Add records that meet the criteria before each group after grouping :From SQL to SPL

In a certain view of the PostgreSQL database, the row_index field is an underscore separated string and also a grouping field. Some groups' row_index can be divided into 3 parts, while others can be divided into 2 parts.

Image description

Now, before each group of records where row_index can be divided into 3 parts, add the group of records where row_index can be divided into 2 parts, and modify row_index to the row_index of each group.

Image description
The records where row_index can be divided into three parts can be grouped by row_index, and then each group of records can be processed by merging the records where row_index can be divided into two parts before each group of records. But after SQL grouping, it must aggregate immediately, and subsets cannot be kept to continue processing each group of records. This requires a detour to solve, using multi-layer nested window functions to bypass this problem, which is difficult to code.

SPL supports retaining subsets after grouping, allowing for continued processing of each group of records.

Image description

A1: Query the database through JDBC.

A2: Retrieve the records where row_index can be divided into two parts.

A3: Remove A2 from the complete data, which means getting records that row_index can be divided into three parts. Group these records while keeping the order unchanged. The symbol \ represents the difference set, and group@u indicates keeping the original order after grouping.

A4: Loop each group of data: Create a new two-dimensional table according to A2, change row_index to the row_index of this group, keep other fields unchanged, and merge the new two-dimensional table with the data of this group. Finally, merge the data of each group. The symbol | represents merging set members.

Open source SPL source address

Top comments (0)