DEV Community

t-o-d
t-o-d

Posted on

Split CSV by number in shell only.

  • Sometimes I have to deal with large CSV files.
  • In this case, dividing and foldering in advance will make it easier to handle.
  • This section describes how to split files and classify folders according to the number of lines using only Shell.

Result

  • The following is the directory structure before execution.
.
├── sample.csv
├── main.sh
Enter fullscreen mode Exit fullscreen mode
  • The following is the description of main.sh
    • sample.csv has 100 rows of data
    • ※Error handling is omitted.
#!/bin/sh
set -e

# file path
[ ! -e "$1" ] && exit 1 || datafile="$1"
# File extension deletion
filename="${datafile%.*}"
# Get number of lines
row=$(grep -c '' $datafile)
# Obtaining the number of splits
sep="$2"
# Number of directories created
dir_cnt=$(awk -v row="$row" -v sep="$sep" 'BEGIN {
    i=row/sep
    printf("%d\n",i+=i<0?0:0.999)
    }
    '
)
# Folder creation
seq -f "${filename}_%01.0f" 1 ${dir_cnt} |
xargs mkdir -p
# File division
split -l ${sep} -a 2 $datafile "${filename}_"
# File movement
count=1
for i in `find . -type f -name "${filename}_*" | sort`
do
    mv $i "${filename}_${count}/${i//_*/_${count}}.csv"
    let count++
done
Enter fullscreen mode Exit fullscreen mode
  • Run as follows.
sh main.sh sample.csv 25
Enter fullscreen mode Exit fullscreen mode
  • After executing, check that the directory structure is as follows.
.
├── main.sh
├── sample.csv
├── sample_1
│   ├── sample_1.csv
├── sample_2
│   ├── sample_2.csv
├── sample_3
│   ├── sample3.csv
├── sample_4
│   ├── sample4.csv
Enter fullscreen mode Exit fullscreen mode

Supplement

Number of created directories

  • Rounding up
    • In the case of a decimal number such as 100/15, the directory is not created normally.
    • Round up to an integer with printf.

File splitting and moving

  • Extension is added by mv.
    • Additional extension (--additional-suffix) in split is not the default on Mac etc.

Link

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay