DEV Community

Alan Reid
Alan Reid

Posted on

14 1

Splitting large CSV files

So I managed to get my hands on a very large CSV file over the weekend - when I say large, I mean almost 9GB large! Yeah thats a lot of data.

Well needless to say I had issues opening said file, It really didn’t wanna work with Excel, VS Code, even my old faithful Sublime Text!!

So I went down a different path and decided to split the file into more manageable chunks. For this we will use terminal and perfrom the following tasks:-

  • Split by line number, or by file size (Kb or Mb);
  • Then add a .csv extension to all files we created for the split.

So to start we need to work out our best option. I started with splitting the file into 100MB chunks. The second line we will use a loop and the mv-command to simply change the extensions of the files.

By running these commands we will be able to open the first one and see how many rows there are in the file.

Note: make sure you have navigated to the correct folder, so the folder where the file has been saved. Then run the following snippet.

split -b 100m file_to_split.csv
for i in *; do mv "$i" "$i.csv"; done
Enter fullscreen mode Exit fullscreen mode

This will now split our file into chunks of 100MB, but we haven’t finished there. More than likely you will have broken rows, not good if you plan to import the CSV’s into a database later. So we open the first file that was created, and see how many lines there were.

In my case there were 415156. But the last row was broken. So to sort this delete all the files that were created. - sorry I found this the easiest way but bare with, there is a reason for it.

Ok so now we know how many row we can expect per file lets re-run the previous snippet this time replacing the file size with line count.

split -l 415000 file_to_split.csv
for i in *; do mv "$i" "$i.csv"; done
Enter fullscreen mode Exit fullscreen mode

To summarise, we utilised the -b flag on the split command which meant we were able to split the file in to smaller chunks. If we want to split our file into files of 1MB is easy, we would use 1m. The -l flag however, tells the split command how many lines of data we want before splitting a file in my case it was 415000.

Originally posted on r3id.dev

Heroku

This site is built on Heroku

Join the ranks of developers at Salesforce, Airbase, DEV, and more who deploy their mission critical applications on Heroku. Sign up today and launch your first app!

Get Started

Top comments (2)

Collapse
 
gonbocer profile image
Cesar Mendoza 🦟

You're article saved my life, thank you so much for posting this.

Collapse
 
shivam_verma_22e0911034f8 profile image
Shivam Verma

Hi. I am getting this error
Image description

Qodo Takeover

Introducing Qodo Gen 1.0: Transform Your Workflow with Agentic AI

Rather than just generating snippets, our agents understand your entire project context, can make decisions, use tools, and carry out tasks autonomously.

Read full post

Best practices for optimal infrastructure performance with Magento

Running a Magento store? Struggling with performance bottlenecks? Join us and get actionable insights and real-world strategies to keep your store fast and reliable.

Tune in to the full event

DEV is partnering to bring live events to the community. Join us or dismiss this billboard if you're not interested. ❤️