DEV Community

Cover image for AWK an old-school tool today
Sergio Marcial
Sergio Marcial

Posted on

AWK an old-school tool today

What is AWK?

AWK is a command-line programming language primarily oriented to text and files processing - some might call it a tool -, simple yet elegant continuous lines of code can replace multiple lines of a more robust language like java or node without losing their intention.

In essence, AWK code is so simple that you can just throw it away after the execution or once your program has finished its work.

% awk 'BEGIN { print "Hello World" }'
Hello World
Enter fullscreen mode Exit fullscreen mode

But there is so much more than that; considering the constant need to process data files, once you have started with AWK, you will stop building complete programs to process CSV or log files for faster and more straightforward with a couple of instructions

% awk '{ print $0 }' example.txt
This is an AWK example

% awk '{ print $4, $1, $5, $3, $2 }' example.txt
AWK This example an is

% awk '{ print $1, "could be your", $4, $5 }' example.txt
This could be your AWK example
Enter fullscreen mode Exit fullscreen mode

Calculations become somewhat ridiculously simple to process

% awk '{ print $0 }' example_numbers.txt
1 2 3 testing

% awk '{ print $1 + $2 + $3, $4 }' example_numbers.txt
6 testing

% awk '{ print $2 * $3, $4 }' example_numbers.txt
6 testing

% awk '{ print $2 / $3, $4 }' example_numbers.txt
0.666667 testing
Enter fullscreen mode Exit fullscreen mode

But the real potential of AWK is still beyond simple operations. With the help of control statements, loops, switch functions, this command-line tool is closer to a programming language hand to hand with multiple file processing operations to make our lives even simpler

For loop example:

% cat loop.awk
#!/bin/awk -f

BEGIN {
    for (i = 1; i <= 3; i++)
        print i
} 

% awk -f loop.awk
1
2
3
Enter fullscreen mode Exit fullscreen mode

Why is relevant today?

In a generation of powerful and versatile programming languages, sometimes we engineers tend to overcomplicate problems, most commonly because of lack of knowledge in other options, so think about how many times have you develop a small Python, NodeJS, or Golang script to read a huge CSV file, or even build a small JVM-oriented language utility with the language of your choice and without even realizing already develop multiple lines of boilerplate (useless) code.

Python script to read a file line by line and print result

import sys

def main():
   filepath = sys.argv[1]

    with open(filepath) as f:
        for index, line in enumerate(f):
            print("Line {}: {}".format(index, line.strip()))


if __name__ == '__main__':
    main()
Enter fullscreen mode Exit fullscreen mode

The same but with AWK

awk '{ print "Line ", $1, ":", $2 }' example.txt
Enter fullscreen mode Exit fullscreen mode

And you could create more examples to explain the difference between creating scripts with AWK and with any other language, but also it is pretty performant in comparison with other

AWK and its variations' performance measurements
AWK and its variations' performance measurements 1

As you can see, this old-school language (AWK was created initially in 1977) could outshine some of these more robust and modern languages in some tasks, and learning it might give you a new tool you didn't even know you want to have.

First steps in AWK

Let's start by mentioning that AWK is in every Linux and macOS distribution (how cool is that?); for Windows, you have to install it (but I am pretty sure it cannot be that hard, right?).

How to know what version of AWK you currently have installed?

% awk -version
awk version 20200816
Enter fullscreen mode Exit fullscreen mode

And now let's start with the basics; AWK commands' structure is pretty simple; however, there are some tricks to it, especially if you want to use it for actual text processing, the basic command could be described in this way <condition> { action } where condition is optional as we saw in a previous example awk '{ print $0 }' example.txt while the action is the operation you need to execute.

For the conditions, there are only two types of conditions, BEGIN and END, and they also can have actions, for example, consider BEGIN as the entry instruction where you can enable, disable or configure different variables within the script run execution, for example, if you want to change the delimiter character from the default space (' ') to a semicolon (;) you can add something like at the beginning of the script BEGIN { FS= ';'}.

AWK provides 8 built-in variables:

  • FILENAME - Name of the current input file
  • FS - Input field separator variable
  • FNR - Number of Records relative to the current input file
  • NF - Number of Fields in a record
  • NR - Number of Records Variable
  • OFS - Output Field Separator Variable
  • ORS - Output Record Separator Variable
  • RS - Record Separator variable

END, on the other hand, will always be at the closing statement and can be used to execute any finishing commands after the main body has been completed, for example, printing final variables' values:

BEGIN { 
    for (i = 1; i <= 3; i++)
        s += $i 
}
END { print s }
Enter fullscreen mode Exit fullscreen mode

Something else worth mentioning is the fact that AWK supports the creation of custom functions when you need to do more complex operations and the script starts to become hard to manage 2

awk '{ print "The square root of", $1, "is", sqrt($1) }'
Enter fullscreen mode Exit fullscreen mode

AWK also provides the functionality to create Arrays (and operations built-in to manage them) and multiple other data types that we won't be discussing in this post because it might take a couple of hundreds of lines. Still, you can find a good description of them here, so please take a look if you are curious to learn more.

Example of array operations in AWK:

Array addition

BEGIN { 
    for (i = 1; i <= 3; i++)
        array[$i]; 
}
END { 
    for (position in array) 
        print position ": " array[position]
 }
Enter fullscreen mode Exit fullscreen mode

Array deleting


BEGIN { 
    for (i = 1; i <= 3; i++)
        array[$i]; 
}
END { 
    for (position in array) 
        delete array[position]
 }
Enter fullscreen mode Exit fullscreen mode

And in case you are thinking how powerful this is and like me trying to take it further to create small AWK powered "apps" to do the monotonous tasks while wondering how can you verify if what you are coding is valid, you can execute any number of unit tests for shell scripts, and therefore, AWK scripts using shunit2

Data processing with AWK

As mentioned a couple of times during this post, AWK's main objective is to process data, which could mean data in files, lines provided command output, or any other form of input data, but let's start simple.

Opening a file and reading the data

% cat example.txt
> This is an AWK example

% awk '{ print $0 }' example.txt
This is an AWK example
Enter fullscreen mode Exit fullscreen mode

From the previous example AWK, we can notice some things like how AWK uses indexes to split the data provided within the file; these indexes are created using the delimiter, which by default is the blank space (check the example in this post on how to define a new delimiter)

Using $0 will print the whole line, while using the sequence generated based on the number of columns will give you control of the data.

% cat example.txt
> This is an AWK example

% awk '{ print $4, $1, $5, $3, $2 }' example.txt
AWK This example an is
Enter fullscreen mode Exit fullscreen mode

You can also straightforwardly concatenate strings:

% cat example.txt
> This is an AWK example

% awk '{ print $1, "could be your", $4, $5 }' example.txt
This could be your AWK example
Enter fullscreen mode Exit fullscreen mode

Searching a value

AWK can search information within the provided input, and one way is using regexp.

% cat example.txt
> This is an AWK example

% awk '/This/ { print $0 }'
This is an AWK example

% awk '/test/ { print $0 }'
Enter fullscreen mode Exit fullscreen mode

Another searching mechanism is using control operations like if, for example:

% cat example.txt
> This is an AWK example

% awk 'if ($1=="This"){ print $0 }'
This is an AWK example
Enter fullscreen mode Exit fullscreen mode

AWK, GAWK, NAWK or MAWK

Finally as usual in any programming language, variants tend to appear with time, and AWK was not the exception; what could be considered the most important (according to me) are the next.

  • GAWK - GNU AWK is available from the GNU project's open source and is currently maintained.
  • NAWK - New AWK Computing, a news release on the AWK project 3
  • MAWK - Fast AWK implementation which it's codebase is based on a byte-code interpreter

Of course, there are other multiple variants out there, and you won't have any trouble finding them.

As you can see, AWK is an excellent flexible and robust command-line tool, which takes a while to ramp up to, but once you get the basics is pretty simple to use and explode its potential.

In the next post, I will go deeper into different and more complex scenarios and examples; let me know if you have any questions or comments or want more specific related content.



  1. https://brenocon.com/blog/2009/09/dont-mawk-awk-the-fastest-and-most-elegant-big-data-munging-language/ 

  2. https://www.gnu.org/software/gawk/manual/html_node/Function-Calls.html 

  3. Robbins, Arnold (March 2014). "The GNU Project and Me: 27 Years with GNU AWK" (PDF). skeeve.com. Retrieved October 4, 2014. 

Top comments (3)

Collapse
 
barakplasma profile image
Michael Salaverry • Edited

great article!

also check out

GitHub logo ezrosent / frawk

an efficient awk-like language

frawk

frawk is a small programming language for writing short programs processing textual data. To a first approximation, it is an implementation of the AWK language; many common Awk programs produce equivalent output when passed to frawk. You might be interested in frawk if you want your scripts to handle escaped CSV/TSV like standard Awk fields, or if you want your scripts to execute faster.

The info subdirectory has more in-depth information on frawk:

  • Overview what frawk is all about, how it differs from Awk.
  • Types: A quick gloss on frawk's approach to types and type inference.
  • Parallelism An overview of frawk's parallelism support.
  • Benchmarks A sense of the relative performance of frawk and other tools when processing large CSV or TSV files.
  • Builtin Functions Reference: A list of builtin functions implemented by frawk, including some that are new when compared with Awk.

frawk is…




for a Rust based AWK like language
Collapse
 
sergiomarcial profile image
Sergio Marcial • Edited

That is awesome, however, sometimes I feel like we have taken AWK a little too far or as a teammate would say probably not far enough yet

This is a Golang POSIX AWK variant

GitHub logo benhoyt / goawk

A POSIX-compliant AWK interpreter written in Go

GoAWK: an AWK interpreter written in Go

Documentation GitHub Actions Build

AWK is a fascinating text-processing language, and somehow after reading the delightfully-terse The AWK Programming Language I was inspired to write an interpreter for it in Go. So here it is, feature-complete and tested against "the one true AWK" test suite.

Read more about how GoAWK works and performs here.

Basic usage

To use the command-line version, simply use go install to install it, and then run it using goawk (assuming $GOPATH/bin is in your PATH):

$ go install github.com/benhoyt/goawk@latest
$ goawk 'BEGIN { print "foo", 42 }'
foo 42
$ echo 1 2 3 | goawk '{ print $1 + $3 }'
4
Enter fullscreen mode Exit fullscreen mode

On Windows, " is the shell quoting character, so use " around the entire AWK program on the command line, and use ' around AWK strings -- this is a non-POSIX extension to make…

Collapse
 
miguelmj profile image
MiguelMJ

I love AWK