DEV Community

Judy
Judy

Posted on

3 1 1 1 1

Split a large csv file into smaller files #eg45

A csv file has a size far greater than 5M. Below is part of its data:

Image description
Use Java to do this: Split the file into smaller files, each having a size of about 5M; file names contain ordinal numbers, such as Orders1.csv and Orders2.csv. One record should only be put into one file.

Write the following SPL code:

Image description
A2: Compute the number of smaller files (N) the csv file will be divided into. Symbol \ performs the division and gets only the integer part; +1 makes the size of each smaller file is a bit less than 5M.

A3: Loop from 1 to N: approximately, divide the large file into N parts according to the size; retrieve the ith part each time to write to a new file while automatically ensuring that records are complete.

Read How to Call a SPL Script in Java to find how to integrate SPL into a Java application.

This is one of the problems on StackOverflow. You can click on it to see that the conventional solution is quite complicated, but the SPL approach is really simple and efficient.

SPL open source address

Download

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

Top comments (0)

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up