DEV Community

Dmitry Romanoff
Dmitry Romanoff

Posted on

AWK

AWK is a text-processing utility on GNU/Linux.
It is very powerful and uses a simple programming language.
It can solve complex text processing tasks with a few lines of code.

Example of tasks can be done with AWK:

Text processing,
Producing formatted text reports,
Performing arithmetic operations,
Performing string operations,
Parsing log files, including log files of DBs,
Constructing queries to populate data into DBs
and many more.

AWK follows a simple workflow − Read, Execute, and Repeat.

Read

AWK reads a line from the input stream (file, pipe, or stdin) and stores it in memory.

Execute

All AWK commands are applied sequentially on the input. By default AWK executes commands
on every line. We can restrict this by providing patterns.

Repeat

This process repeats until the file reaches its end.

BEGIN block

The syntax of the BEGIN block is as follows −
Syntax
BEGIN {awk-commands}
The BEGIN block gets executed at program start-up. It executes only once. This is a good place
to initialize variables. BEGIN is an AWK keyword and hence it must be in upper-case. Please
note that this block is optional.

Body Block

The syntax of the body block is as follows −
Syntax
/pattern/ {awk-commands}
The body block applies AWK commands on every input line. By default, AWK executes
commands on every line. We can restrict this by providing patterns. Note that there are no
keywords for the Body block.

END Block

The syntax of the END block is as follows −
Syntax
END {awk-commands}
The END block executes at the end of the program. END is an AWK keyword and hence it must
be in upper-case. Please note that this block is optional.

dmi@dmi-laptop:~/my_awk$ cat marks.txt
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
Enter fullscreen mode Exit fullscreen mode
dmi@dmi-laptop:~/my_awk$ awk 'BEGIN{printf "Sr No\tName\tSub\tMarks\n"}'
Sr No Name Sub Marks
Enter fullscreen mode Exit fullscreen mode
dmi@dmi-laptop:~/my_awk$ awk 'BEGIN{printf "Sr No\tName\tSub\tMarks\n"} {print}' marks.txt
Sr No Name Sub Marks
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
Enter fullscreen mode Exit fullscreen mode
dmi@dmi-laptop:~/my_awk$ awk '{print}' marks.txt
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
Enter fullscreen mode Exit fullscreen mode
dmi@dmi-laptop:~/my_awk$ cat command.awk
{print}
dmi@dmi-laptop:~/my_awk$ awk -f command.awk marks.txt
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode
dmi@dmi-laptop:~/my_awk$ awk -v name=Linda 'BEGIN{printf "Name = %s\n", name}'
Name = Linda
Enter fullscreen mode Exit fullscreen mode
dmi@dmi-laptop:~/my_awk$ cat marks.txt
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
dmi@dmi-laptop:~/my_awk$ awk '{print $3 "\t" $4}' marks.txt
Physics 80
Maths 90
Biology 87
English 85
History 89
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode

In the following example we're searching form pattern a.
When a pattern match succeeds, it executes a command from the body block.

dmi@dmi-laptop:~/my_awk$ cat marks.txt
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
dmi@dmi-laptop:~/my_awk$ awk '/a/ {print $0}' marks.txt
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode

In the absence of a body block − default action is taken which is to print the record.

dmi@dmi-laptop:~/my_awk$ awk '/a/' marks.txt
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode

We can print columns in any order.

dmi@dmi-laptop:~/my_awk$ awk '/a/ {print $4 "\t" $3}' marks.txt
90 Maths
87 Biology
85 English
89 History
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode

We can count and print the number of lines for which a pattern match succeeded.

dmi@dmi-laptop:~/my_awk$ cat marks.txt
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
dmi@dmi-laptop:~/my_awk$ awk '/a/{++cnt} END {print "Count = ", cnt}' marks.txt
Count = 4
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode
dmi@dmi-laptop:~/my_awk$ cat my_example.txt
aaa bbb
cccccc dd
eee
fffff fff ffff
ggg hh hhh hhhh
kkk ll
dmi@dmi-laptop:~/my_awk$ awk 'length($0) > 3' my_example.txt
aaa bbb
cccccc dd
fffff fff ffff
ggg hh hhh hhhh
kkk ll
dmi@dmi-laptop:~/my_awk$ awk 'length($0) > 5' my_example.txt
aaa bbb
cccccc dd
fffff fff ffff
ggg hh hhh hhhh
kkk ll
dmi@dmi-laptop:~/my_awk$ awk 'length($0) > 8' my_example.txt
cccccc dd
fffff fff ffff
ggg hh hhh hhhh
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode

$0 variable stores the entire line.
In the absence of a body block, default action is taken, i.e., the print action.

ARGC is a standard AWK variable
It implies the number of arguments provided at the command line.

dmi@dmi-laptop:~/my_awk$ awk 'BEGIN {print "Arguments =", ARGC}'
Enter fullscreen mode Exit fullscreen mode

One Two Three Four
Arguments = 5

ARGV is a standard AWK variable.
It is an array that stores the command-line arguments.
The array's valid index ranges from 0 to ARGC-1.

dmi@dmi-laptop:~/my_awk$ cat command.awk
BEGIN {
for (i = 0; i < ARGC - 1; ++i) {
printf "ARGV[%d] = %s\n", i, ARGV[i]
}
}
dmi@dmi-laptop:~/my_awk$ awk -f command.awk one two three four five six seven eight
ARGV[0] = awk
ARGV[1] = one
ARGV[2] = two
ARGV[3] = three
ARGV[4] = four
ARGV[5] = five
ARGV[6] = six
ARGV[7] = seven
dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ awk 'BEGIN {
for (i = 0; i < ARGC - 1; ++i) {
printf "ARGV[%d] = %s\n", i, ARGV[i]
}
} ' one two three four five six seven eight
ARGV[0] = awk
ARGV[1] = one
ARGV[2] = two
ARGV[3] = three
ARGV[4] = four
ARGV[5] = five
ARGV[6] = six
ARGV[7] = seven
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode

Regular expression .
It matches any single character except the end of line character.

dmi@dmi-laptop:~/my_awk$ echo -e "cat\nbat\nfun\nfin\nfan"
cat
bat
fun
fin
fan
echo -e ---- enables interpretation of backslash escapes
dmi@dmi-laptop:~/my_awk$ echo -e "cat\nbat\nfun\nfin\nfan" | awk '/f.n/'
fun
fin
fan
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode

Regular expression ^ .
It matches the start of the line.

dmi@dmi-laptop:~/my_awk$ echo -e "This\nThat\nThere\nTheir\nthese"
This
That
There
Their
these
dmi@dmi-laptop:~/my_awk$ echo -e "This\nThat\nThere\nTheir\nthese" | awk '/^The/'
There
Their
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode

Regular expression $.
It matches the end of line.

dmi@dmi-laptop:~/my_awk$ echo -e "knife\nknow\nfun\nfin\nfan\nnine"
knife
know
fun
fin
fan
nine
dmi@dmi-laptop:~/my_awk$ echo -e "knife\nknow\nfun\nfin\nfan\nnine" | awk '/n$/'
fun
fin
fan
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode

Regular expression [ ] Match character set
It is used to match only one out of several characters.

dmi@dmi-laptop:~/my_awk$ echo -e "Call\nTall\nBall"
Call
Tall
Ball
dmi@dmi-laptop:~/my_awk$ echo -e "Call\nTall\nBall" | awk '/[CT]all/'
Call
Tall
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode

Regular expression [^ ] Exclusive set
In the exclusive set, the ^ negates the set of characters in the square brackets.

dmi@dmi-laptop:~/my_awk$ echo -e "Call\nTall\nBall"
Call
Tall
Ball
dmi@dmi-laptop:~/my_awk$ echo -e "Call\nTall\nBall" | awk '/[^CT]all/'
Ball
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode

How to find the length of each record in a file?

dmi@dmi-laptop:~/my_awk$ cat my_example.txt
aaa bbb
cccccc dd
eee
fffff fff ffff
ggg hh hhh hhhh
kkk ll
dmi@dmi-laptop:~/my_awk$ awk '{print $0, ".....", length($0)}' my_example.txt
aaa bbb ..... 7
cccccc dd ..... 9
eee ..... 3
fffff fff ffff ..... 14
ggg hh hhh hhhh ..... 15
kkk ll ..... 6
Enter fullscreen mode Exit fullscreen mode

Delimiter

dmi@dmi-laptop:~/my_awk$ cat some_file_with_commas.txt
aaa, bbb, ccc, dddd
eee
ff, gggg, hhhh, kk, llllll, mmmm, nnn
ooooo, pppp,qqq
rrr
sss
ttt, uuu,
vvv
dmi@dmi-laptop:~/my_awk$ awk -F, ' { print $2 } ' some_file_with_commas.txt
bbb
gggg
pppp
uuu
dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ awk -F, ' length($2)>0 { print $2 } ' some_file_with_commas.txt
bbb
gggg
pppp
uuu
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode

Sum of file sizes with AWK on a list of files

dmi@dmi-laptop:~/my_awk$ ls -l
total 16
-rw-rw-r-- 1 dmi dmi 99 Dec 11 08:45 command.awk
-rw-rw-r-- 1 dmi dmi 120 Dec 11 08:18 marks.txt
-rw-rw-r-- 1 dmi dmi 60 Dec 11 08:37 my_example.txt
-rw-rw-r-- 1 dmi dmi 100 Dec 11 09:11 some_file_with_commas.txt
dmi@dmi-laptop:~/my_awk$ ls -l | awk '{sum += $5} END {print sum}'
379
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode
dmi@dmi-laptop:~/my_awk$ ls -l
total 16
-rw-rw-r-- 1 dmi dmi 99 Dec 11 08:45 command.awk
-rw-rw-r-- 1 dmi dmi 120 Dec 11 08:18 marks.txt
-rw-rw-r-- 1 dmi dmi 60 Dec 11 08:37 my_example.txt
-rw-rw-r-- 1 dmi dmi 100 Dec 11 09:11 some_file_with_commas.txt
dmi@dmi-laptop:~/my_awk$ ls -l | awk '$5 < 100 {print $0} '
total 16
-rw-rw-r-- 1 dmi dmi 99 Dec 11 08:45 command.awk
-rw-rw-r-- 1 dmi dmi 60 Dec 11 08:37 my_example.txt
dmi@dmi-laptop:~/my_awk$ ls -l | awk '$5 < 100 {print $9} '
command.awk
my_example.txt
dmi@dmi-laptop:~/my_awk$ ls -l | awk 'length($5)>0 && $5 < 100 {print $9} '
command.awk
my_example.txt
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode

Skip first line of file

dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data
Name, Address, Birthday, Mark
John, Green street, 2000-01-01, 100
Ann, Apple street, 1980-05-22, 99
Miki, Orange street, 1985-01-01, 97
dmi@dmi-laptop:~/my_awk$ awk '(NR>1)' some_data_to_populate.data
John, Green street, 2000-01-01, 100
Ann, Apple street, 1980-05-22, 99
Miki, Orange street, 1985-01-01, 97
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode

The awk's NR variable indicates the number of records in a file.

dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data
Name, Address, Birthday, Mark
John, Green street, 2000-01-01, 100
Ann, Apple street, 1980-05-22, 99
Miki, Orange street, 1985-01-01, 97
dmi@dmi-laptop:~/my_awk$ awk -F, '(NR>1) { printf("%s", $2) } ' some_data_to_populate.data
Green street Apple street Orange streetdmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode
dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data
Name, Address, Birthday, Mark
John, Green street, 2000-01-01, 100
Ann, Apple street, 1980-05-22, 99
Miki, Orange street, 1985-01-01, 97
dmi@dmi-laptop:~/my_awk$ awk -F, '(NR>1) { printf("%s\n", $2) } ' some_data_to_populate.data
Green street
Apple street
Orange street
Enter fullscreen mode Exit fullscreen mode

(NR>1) - not print the first rec in the file

dmi@dmi-laptop:~/my_awk$ cat some_file_with_commas.txt
aaa, bbb, ccc, dddd
eee
ff, gggg, hhhh, kk, llllll, mmmm, nnn
ooooo, pppp,qqq
rrr
sss
ttt, uuu,
vvv
dmi@dmi-laptop:~/my_awk$ awk ' { printf("\x27") } ' some_file_with_commas.txt
''''''''dmi@dmi-laptop:~/
Enter fullscreen mode Exit fullscreen mode
dmi@dmi-laptop:~/my_awk$ awk ' { printf("\x27\n") } ' some_file_with_commas.txt
'
'
'
'
'
'
'
'
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode
dmi@dmi-laptop:~/my_awk$ cat some_file_with_commas.txt
aaa, bbb, ccc, dddd
eee
ff, gggg, hhhh, kk, llllll, mmmm, nnn
ooooo, pppp,qqq
rrr
sss
ttt, uuu,
vvv
dmi@dmi-laptop:~/my_awk$ awk -F, ' { printf("\x27%s\x27\n", $1) } ' some_file_with_commas.txt
'aaa'
'eee'
'ff'
'ooooo'
'rrr'
'sss'
'ttt'
'vvv'
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode
dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data | awk -F, ' NR>1 { printf("insert
into some_table values(trim(\x27%s\x27), trim(\x27%s\x27), trim(\x27%s\x27), %s);\n", $1, $2,
$3, $4); } '
insert into some_table values(trim('John'), trim(' Green street'), trim(' 2000-01-01'), 100);
insert into some_table values(trim('Ann'), trim(' Apple street'), trim(' 1980-05-22'), 99);
insert into some_table values(trim('Miki'), trim(' Orange street'), trim(' 1985-01-01'), 97);
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode
dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data | awk -F, ' NR>1 { printf("update
some_table set the_address=trim(\x27%s\x27), the_birthday=trim(\x27%s\x27), the_mark=%s
where the_name=\x27%s\x27;\n", $2, $3, $4, $1); } '
update some_table set the_address=trim(' Green street'), the_birthday=trim(' 2000-01-01'),
the_mark= 100 where the_name='John';
update some_table set the_address=trim(' Apple street'), the_birthday=trim(' 1980-05-22'),
the_mark= 99 where the_name='Ann';
update some_table set the_address=trim(' Orange street'), the_birthday=trim(' 1985-01-01'),
the_mark= 97 where the_name='Miki';
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode
dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data
Name, Address, Birthday, Mark
John, Green street, 2000-01-01, 100
Ann, Apple street, 1980-05-22, 99
Miki, Orange street, 1985-01-01, 97
dmi@dmi-laptop:~/my_awk$
dmi@dmi-laptop:~/my_awk$ cat some_data_to_populate.data | awk -F, ' NR>1 { printf("insert
into some_table values(trim(\x27%s\x27), trim(\x27%s\x27), trim(\x27%s\x27), %s);\n", $1, $2,
$3, $4); } ' > RunMe.sql
dmi@dmi-laptop:~/my_awk$ cat RunMe.sql
insert into some_table values(trim('John'), trim(' Green street'), trim(' 2000-01-01'), 100);
insert into some_table values(trim('Ann'), trim(' Apple street'), trim(' 1980-05-22'), 99);
insert into some_table values(trim('Miki'), trim(' Orange street'), trim(' 1985-01-01'), 97);
dmi@dmi-laptop:~/my_awk$
Enter fullscreen mode Exit fullscreen mode

Top comments (0)