DEV Community

t-o-d
t-o-d

Posted on

Split CSV by number in shell only.

  • Sometimes I have to deal with large CSV files.
  • In this case, dividing and foldering in advance will make it easier to handle.
  • This section describes how to split files and classify folders according to the number of lines using only Shell.

Result

  • The following is the directory structure before execution.
.
├── sample.csv
├── main.sh
Enter fullscreen mode Exit fullscreen mode
  • The following is the description of main.sh
    • sample.csv has 100 rows of data
    • ※Error handling is omitted.
#!/bin/sh
set -e

# file path
[ ! -e "$1" ] && exit 1 || datafile="$1"
# File extension deletion
filename="${datafile%.*}"
# Get number of lines
row=$(grep -c '' $datafile)
# Obtaining the number of splits
sep="$2"
# Number of directories created
dir_cnt=$(awk -v row="$row" -v sep="$sep" 'BEGIN {
    i=row/sep
    printf("%d\n",i+=i<0?0:0.999)
    }
    '
)
# Folder creation
seq -f "${filename}_%01.0f" 1 ${dir_cnt} |
xargs mkdir -p
# File division
split -l ${sep} -a 2 $datafile "${filename}_"
# File movement
count=1
for i in `find . -type f -name "${filename}_*" | sort`
do
    mv $i "${filename}_${count}/${i//_*/_${count}}.csv"
    let count++
done
Enter fullscreen mode Exit fullscreen mode
  • Run as follows.
sh main.sh sample.csv 25
Enter fullscreen mode Exit fullscreen mode
  • After executing, check that the directory structure is as follows.
.
├── main.sh
├── sample.csv
├── sample_1
│   ├── sample_1.csv
├── sample_2
│   ├── sample_2.csv
├── sample_3
│   ├── sample3.csv
├── sample_4
│   ├── sample4.csv
Enter fullscreen mode Exit fullscreen mode

Supplement

Number of created directories

  • Rounding up
    • In the case of a decimal number such as 100/15, the directory is not created normally.
    • Round up to an integer with printf.

File splitting and moving

  • Extension is added by mv.
    • Additional extension (--additional-suffix) in split is not the default on Mac etc.

Link

Sentry image

Hands-on debugging session: instrument, monitor, and fix

Join Lazar for a hands-on session where you’ll build it, break it, debug it, and fix it. You’ll set up Sentry, track errors, use Session Replay and Tracing, and leverage some good ol’ AI to find and fix issues fast.

RSVP here →

Top comments (0)

Heroku

Simplify your DevOps and maximize your time.

Since 2007, Heroku has been the go-to platform for developers as it monitors uptime, performance, and infrastructure concerns, allowing you to focus on writing code.

Learn More

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay