DEV Community

Judy
Judy

Posted on

3 1 1 1 1

Split a Huge CSV File into Multiple Smaller CSV Files #eg69

Problem description & analysis

Below is CSV file sample.csv:

v2aowqhugt,q640lwdtat,8cqw2gtm0g,ybdncfeue8,3tzwyiouft,…

f0ewv2v00z,x2ck96ngmd,9htr2874n5,fx430s8wqy,tw40yn3t0j,…

p2h6fphwco,kldbn6rbzt,8okyllngxz,a8k9slqfms,bqz5fb7cm9,…

st63tcbfv8,2n862vqzww,2equ0ydeet,0x5tidunc6,npis28avpj,…

bn1u58s39a,mg7064jlrb,edyj3t4s95,zvuf9n29ai,1m0yn8uh0n,…

The file contains a huge volume of data that cannot be wholly loaded into the memory. 100000 rows at most can be loaded at a time into the available memory space. So we need to split the file into multiple smaller CSV files containing 100000 rows each, as shown below:

sample1.csv  100000 rows

sample2.csv  100000 rows

sample[n].csv  less than or equal to 100000 rows

Solution

Write the script p1.dfx below in esProc:
Explanation

A1  Create a cursor for the original CSV file.

A2  Loop through A1’s cursor to read in 100000 rows at one time.

B2  Export A2’s rows to sample[n].csv. #A2 represents the loop number which starts from 1.

Read How to Call an SPL Script in Java to learn how to integrate the script code into a Java program.

SPL open source address

Download

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read more

Top comments (1)

Collapse
 
esproc_spl profile image
Judy

Download and try it, it will surprise you!

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up