DEV Community

Judy
Judy

Posted on

2 1 1 1 1

Update the Latest Data in a CSV File to the Database #eg66

Problem description & analysis

Below is CSV file emp.csv:

EMPID,FIRSTNAME,LASTNAME,EMAIL,UPDATEDATE

123,John,Smith,john.smith01@email.com,01/01/2020

234,Bruce,Waye,bruce.wayne@wayneenterprises.com,02/02/2020

123,John,Smith,john.smith02@email.com,02/15/2020

345,Clark,Kent,clark.kent@dailyplanet.com,02/16/2020

123,John,Smith,john.smith03@email.com,02/20/2020

In the above file, UPADATEDATE field is ordered and EMPID field contains duplicates.

We are trying to divide records in the CSV file into two parts:

  1. Write the newest employee records, which are those with the latest UPDATEDATE values after the file is grouped by EMPID, to EMP table;

  2. Write the rest of the records, which are the difference, to EMP_HIS table according to the original order.

Solution

Write the following script p1.dfx in esProc:

Image description
Explanation:

A1  Import data from the CSV file, during which the first row is read as column headers and UPDATEDATE is parsed into date type (while the original date format is MM/dd/yyyy).

A2  Group A1’s table by EMPID and get the last member from each group (the record with the latest UPDATEDATE value).

A3  Get the rest of the records, which is the difference of A1 and A2.

A4  Connect to the database.

A5  Update the latest employee record to the database by EMPID or insert it to EMP table.

A6  Update the historical employee records by EMPID and UPDATEDATE to the database, or insert them to EMP_HIS table.

A7  Close database connection.

Find how to integrate the script code with a Java program in How to Call an SPL Script in Java.

SPL open source address

Download

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

Image of Timescale

Timescale – the developer's data platform for modern apps, built on PostgreSQL

Timescale Cloud is PostgreSQL optimized for speed, scale, and performance. Over 3 million IoT, AI, crypto, and dev tool apps are powered by Timescale. Try it free today! No credit card required.

Try free

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay