Cleaning Quake server logs to generate score boards

#bash #shell #awk #data

It is a common challenge for technical interviews to parse Quake 3 server logs and display:

Players in a match
Player score card, listing player names and kill count:
1. Ignore <world> as a player
2. If <world> kills a player, add -1 to player's kill count
(optional) Group outputs above by match
(optional) Death cause report by match

Working with files is a common practice for any developer. Using awk not so much, even though it is IMHO one of the best tools for doing so:

The language is built for (1) text matching and (2) manipulation.
Working with small files is as easy as it is working with very large files.

Intending to spread the knowledge of the tool to more people, let's solve the challenge with AWK and get to know how you can effectively start using it today in your workflow. I assume you know well a programming language, your way around a (*nix) CLI and that we are using GNU awk.

The beginning of a not so usual program

As it is common with other Unix tools, it is better to break the program into smaller pieces, Awk programs bigger than ~150 lines are difficult to maintain.
Here are the different programs we are going to create:

clean.awk will read input files, which are the original log files, and output a cleaner version of their content. Containing just the data we need to manipulate and use.
scoreboard.awk will use the output from the previous program to produce the score boards for each game.

Let's create a walking skeleton to run and debug our progress while tackling the challenge:

$ mkdir /tmp/awk-quake
$ cd !$
$ curl --remote-name -L https://gist.githubusercontent.com/augustohp/073936cc213fe96bc99a498932c18be7/raw/9e52e4da221f2f0ce1dfc11f57c1679a2cdb77f5/qgames.log
$ tail qgames.log
 13:55 Kill: 3 4 6: Oootsimo killed Dono da Bola by MOD_ROCKET
 13:55 Exit: Fraglimit hit.
 13:55 score: 20  ping: 8  client: 3 Oootsimo
 13:55 score: 19  ping: 14  client: 6 Zeh
 13:55 score: 17  ping: 1  client: 2 Isgalamido
 13:55 score: 13  ping: 0  client: 5 Assasinu Credi
 13:55 score: 10  ping: 8  client: 4 Dono da Bola
 13:55 score: 6  ping: 19  client: 7 Mal
 14:11 ShutdownGame:
 14:11 ------------------------------------------------------------
$ cat clean.awk
{ print }
$ watch gawk -f clean.awk qgames.log

Above we:

Downloaded qgames.log
Created clean.awk that prints everything passed to it
Executed the program every couple of seconds (with watch) to see its result while we change it in another session (to stop watch, use CTRL-C)

Let's change clean.awk to filter just the lines useful to us, and help us debug what to do with them:

BEGIN {
    FS = " "
    LFS = "\n"
}
/Init/ { print }
/kill/ { debug_fields() }

function debug_fields()
{
    for (i = 1; i <= NF; i++) {
        printf("%d: %s\n", i, $i)
    }
}

Don't despair yet, it is pretty simple what we are doing:

BEGIN is a special block, that gets executed once at the start of the parsing:
1. We use it to (re-)define some special variables:
  1. FS defines the field separator (space). It is used to break a matching line into a smaller array of objects.
  2. LFS defines the line separator (new line). Everything until that character will be treated as a line.
/match/ { action } blocks execute a set of actions when a match (regex supported) is found:
1. /Init/ { print } prints every line that has Init on it, without doing anything more.
2. /kill/ { debug_fields() } executes the debug_fields() function for every line that has a matching kill string on it.
3. Every line that doesn't match the rules above is ignored.
function debug_fields() prints all fields identified after breaking the line with FS:
1. NF is a special variable containing the number of fields parsed for the current line.
2. $n is the field n parsed. Inside the loop $i will become $1,
  $2 and $3 allowing us to retrieve the contents of every field on that
  line, displaying something like:
```
1: 20:54                                     
2: Kill:
3: 1022
4: 2
5: 22:
6: <world>
7: killed
8: Isgalamido
9: by
10: MOD_TRIGGER_HURT
```
3. The output above is useful to debug the current line contents we can work with. Try changing debug_fields() action to print $6 " killed " $8.

With little changes, we can use $6 (killer) and $8 (killed) to display who killed who, which is pretty much everything we need.

🐛 If player names would not contain spaces we'd be ready. But Assassinu Credi, for example, breaks our algorithm because we use spaces to separate fields.
When he kills someone $8 will killed instead of the other player name.

Let's see this happening:

BEGIN {
    FS = " "
    LFS = "\n"
}
/Init/ { next }
/Assas/ { print $6 " killed " $8 }

The program above ignores (with next action) lines matching Init and prints just lines matching Assas.

$ awk -f clean.awk qgames.log
Zeh killed Assasinu
<world> killed Assasinu
Isgalamido killed Assasinu
Zeh killed Assasinu
Assasinu killed killed

Note that Assasinu killed killed line is wrong. It doesn't have the name of the killed player. Let's fix this!

Making things more reliable with regex

The end clean.awk program is below. It substitutes some strings by nfs (new file separator) variable and removes the prefix on lines that notifies of a kill:

BEGIN {
    FS = " "
    LFS = "\n"
    nfs = "|"
    current_game = 0
}
/Init/ { current_game++ }
/kill/ {
    sub(/^[ 0-9:]+ Kill: [0-9: ]+/, "", $0)
    sub(/ killed /, nfs, $0)
    sub(/ by /, nfs, $0)
    print $0 nfs current_game
}

On the BEGIN section, declares 2 new variables:
1. nfs to separate output by something other than spaces, so next programs easily support player names with them.
2. current_game is a variable that gets incremented every time a new game starts.
/Init/ marks a new game:
1. Increments the variable current_game for the next time it gets used
For every /kill/:
1. sub(regex, replacement, target) will put replacement into every matching regex on target, replacing target. $0 is the whole current line.
2. sub(/^[ ... removes the prefix of the line until the player name.
3. sub(/ by... and sub(/ killed... replaces these matches by nfs (the new field separator), allowing us to easily identify ($1) the killer, ($2) who got killed and ($3) how he got killed.
4. print will print the current line ($0) with the current game as a suffix:
  - As every sub() replaces the current line ($0), we now have only what we needed.
  - As awk programs operate on lines, it is easier to have everything we need on them. That is why we add current game to every line.

Executing the program above, produces:

$ awk -f clean.awk qgames.log | tee qgames-clean.log
<world>|Isgalamido|MOD_TRIGGER_HURT|2
<world>|Isgalamido|MOD_TRIGGER_HURT|2
<world>|Isgalamido|MOD_TRIGGER_HURT|2
Isgalamido|Mocinha|MOD_ROCKET_SPLASH|2
Isgalamido|Isgalamido|MOD_ROCKET_SPLASH|2
Isgalamido|Isgalamido|MOD_ROCKET_SPLASH|2
<world>|Isgalamido|MOD_TRIGGER_HURT|2
<world>|Isgalamido|MOD_TRIGGER_HURT|2
<world>|Isgalamido|MOD_TRIGGER_HURT|2
<world>|Isgalamido|MOD_FALLING|2
<world>|Isgalamido|MOD_TRIGGER_HURT|2
Isgalamido|Mocinha|MOD_ROCKET|3
<world>|Zeh|MOD_TRIGGER_HURT|3

With the qgames-clean.log file we can now easily achieve every objective of the original challenge without having to deal with:

Unneeded context.
Space separators. With FS = "|" we use | as field separator and have:
- $1 as the killer
- $2 who got killed
- $3 how killer killed killed
- $4 in which game that happened
A "checkpoint". If the log changes format, or we discover a bug, as long we produce an output conforming the current format we are good to use the next programs.

Next steps

How about you try to figure out the rest? I will post my solution and, if you learned something from this, I promise you will learn something else on the next one as well.

The Gnu awk's manual is really good - from a time technical documents were worth reading. You don't need to read everything, the index will take you where you need. Pinky promise!

I won't leave you without anything though, here is a beginning for scoreboard.awk:

BEGIN {
    FS = "|"   
}
{
    # Sets a player as a key in the players array
    players[$1] = $1
    players[$2] = $2
}
END {
    # Removes <world> from players
    for (name in players) {
        if (name == "<world>")
            continue
        print name
    }
}

Let me know of your solution, suggestions or doubts in the comments! ❤️

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now