loading...
Cover image for SEQ & PASTE: An effective way to add sequential numbers to the beginning of a file in bash

SEQ & PASTE: An effective way to add sequential numbers to the beginning of a file in bash

drraq profile image Rehan Qadir ・3 min read

We all learn newer techniques everyday. I wrote a post couple of weeks ago on how to add sequential numbers to the beginning of data file (lets say a CSV file) using sed command in Linux. You can check that post here:

Discrepancy

It was observed that the technique was not quite optimized for files with millions of lines as it uses for loop and go through line by line to insert sequential numbers.

New Technique

Recently I explored seq and paste Linux commands and I was amazed how effectively we can accomplish our desired target without using any for loop or manipulating the data file line by line.

Lets dive in to see how we can leverage these two commands.

seq [n] generates sequential numbers starting from 1 to n.

seq 10
# Output
# 1
# 2
# 3
# 4
# 5
# 6
# 7
# 8
# 9
# 10

We can redirect the output of the command to a file, say, serial.

seq 1024 > serial

wc -l serial
# Output
# 1024 serial

The second command we are going to explore is paste. The command uses two or more files to produce an output separated by whitespace in columns.

Lets say we have a cities file with names of different cities in it

Prague
Montreal
Amsterdam
Rome
Barcelona

and a serial file with the following numbers

1
2
3
4
5

Lets use paste command on serial and cities files.

paste serial cities
# Output
# 1 Prague
# 2 Montreal
# 3 Amsterdam
# 4 Rome
# 5 Barcelona

As we can see the output is separated by a whitespace. We will see how we can modify the delimiter shortly.

Wind things up

Lets put 2 commands together to accomplish our target.

Suppose we have a CSV file mock.csv with the following content

Ryder,eget.metus.In@congueturpis.com,El Quisco
Keelie,vel.faucibus.id@libero.net,Temuka
Hamilton,enim.non.nisi@Maecenas.co.uk,Leeds
Lani,neque.sed.dictum@et.net,Largs
Gloria,montes.nascetur@nisl.co.uk,Vejalpur

and we wish to add sequential numbers to the beginning of the file. Here is the final bash script.

#!/bin/bash

# Generate sequence of numbers (1 to 5)
seq 5 > serial

# Use paste to combine two files with comma as a delimiter
paste -d, serial mock.csv > data.csv

cat data.csv
# Output
# 1,Ryder,eget.metus.In@congueturpis.com,El Quisco
# 2,Keelie,vel.faucibus.id@libero.net,Temuka
# 3,Hamilton,enim.non.nisi@Maecenas.co.uk,Leeds
# 4,Lani,neque.sed.dictum@et.net,Largs
# 5,Gloria,montes.nascetur@nisl.co.uk,Vejalpur

Viola!!! That was simple and elegant. Note that we have used -d, option with paste to indicate the delimiter to be , rather than a whitespace.

Note that we have hard coded 5 as an argument to seq command because we know beforehand that mock.csv contains 5 rows. What if we don't know the number of rows in the CSV file? Simply use wc -l command and extract the number of lines using awk or cut.

LINES=$(wc -l mock.csv | awk ' {print $1} ')
# LINES=$(wc -l mock.csv | cut -d ' ' -f1)

# Use LINES variable as an argument to seq command
seq $LINES

That's it.


I hope it will help you. Keep Coding!

Posted on May 26 by:

drraq profile

Rehan Qadir

@drraq

Automation, Bash, Node, Nuxt

Discussion

markdown guide