It is a common challenge for technical interviews to parse Quake 3 server logs and display:
- Players in a match
- Player score card, listing player names and kill count:
- Ignore
<world>
as a player - If
<world>
kills a player, add-1
to player's kill count
- Ignore
- (optional) Group outputs above by match
- (optional) Death cause report by match
Working with files is a common practice for any developer. Using awk not so much, even though it is IMHO one of the best tools for doing so:
- The language is built for (1) text matching and (2) manipulation.
- Working with small files is as easy as it is working with very large files.
Intending to spread the knowledge of the tool to more people, let's solve the challenge with AWK and get to know how you can effectively start using it today in your workflow. I assume you know well a programming language, your way around a (*nix) CLI and that we are using GNU awk.
The beginning of a not so usual program
As it is common with other Unix tools, it is better to break the program into smaller pieces, Awk programs bigger than ~150 lines are difficult to maintain.
Here are the different programs we are going to create:
-
clean.awk
will read input files, which are the original log files, and output a cleaner version of their content. Containing just the data we need to manipulate and use. -
scoreboard.awk
will use the output from the previous program to produce the score boards for each game.
Let's create a walking skeleton to run and debug our progress while tackling the challenge:
$ mkdir /tmp/awk-quake
$ cd !$
$ curl --remote-name -L https://gist.githubusercontent.com/augustohp/073936cc213fe96bc99a498932c18be7/raw/9e52e4da221f2f0ce1dfc11f57c1679a2cdb77f5/qgames.log
$ tail qgames.log
13:55 Kill: 3 4 6: Oootsimo killed Dono da Bola by MOD_ROCKET
13:55 Exit: Fraglimit hit.
13:55 score: 20 ping: 8 client: 3 Oootsimo
13:55 score: 19 ping: 14 client: 6 Zeh
13:55 score: 17 ping: 1 client: 2 Isgalamido
13:55 score: 13 ping: 0 client: 5 Assasinu Credi
13:55 score: 10 ping: 8 client: 4 Dono da Bola
13:55 score: 6 ping: 19 client: 7 Mal
14:11 ShutdownGame:
14:11 ------------------------------------------------------------
$ cat clean.awk
{ print }
$ watch gawk -f clean.awk qgames.log
Above we:
- Downloaded
qgames.log
- Created
clean.awk
that prints everything passed to it - Executed the program every couple of seconds (with
watch
) to see its result while we change it in another session (to stopwatch
, use CTRL-C)
Let's change clean.awk
to filter just the lines useful to us, and help us debug what to do with them:
BEGIN {
FS = " "
LFS = "\n"
}
/Init/ { print }
/kill/ { debug_fields() }
function debug_fields()
{
for (i = 1; i <= NF; i++) {
printf("%d: %s\n", i, $i)
}
}
Don't despair yet, it is pretty simple what we are doing:
-
BEGIN
is a special block, that gets executed once at the start of the parsing:- We use it to (re-)define some special variables:
-
FS
defines the field separator (space). It is used to break a matching line into a smaller array of objects. -
LFS
defines the line separator (new line). Everything until that character will be treated as a line.
-
- We use it to (re-)define some special variables:
-
/match/ { action }
blocks execute a set ofactions
when amatch
(regex supported) is found:-
/Init/ { print }
prints every line that hasInit
on it, without doing anything more. -
/kill/ { debug_fields() }
executes thedebug_fields()
function for every line that has a matchingkill
string on it. - Every line that doesn't match the rules above is ignored.
-
-
function debug_fields()
prints all fields identified after breaking the line withFS
:-
NF
is a special variable containing the number of fields parsed for the current line. -
$n
is the fieldn
parsed. Inside the loop$i
will become$1
,
$2
and$3
allowing us to retrieve the contents of every field on that
line, displaying something like:
1: 20:54 2: Kill: 3: 1022 4: 2 5: 22: 6: <world> 7: killed 8: Isgalamido 9: by 10: MOD_TRIGGER_HURT
The output above is useful to debug the current line contents we can work with. Try changing
debug_fields()
action toprint $6 " killed " $8
.
-
With little changes, we can use $6
(killer) and $8
(killed) to display who killed who, which is pretty much everything we need.
🐛 If player names would not contain spaces we'd be ready. But Assassinu Credi
, for example, breaks our algorithm because we use spaces to separate fields.
When he kills someone $8
will killed
instead of the other player name.
Let's see this happening:
BEGIN {
FS = " "
LFS = "\n"
}
/Init/ { next }
/Assas/ { print $6 " killed " $8 }
The program above ignores (with next
action) lines matching Init
and prints just lines matching Assas
.
$ awk -f clean.awk qgames.log
Zeh killed Assasinu
<world> killed Assasinu
Isgalamido killed Assasinu
Zeh killed Assasinu
Assasinu killed killed
Note that Assasinu killed killed
line is wrong. It doesn't have the name of the killed player. Let's fix this!
Making things more reliable with regex
The end clean.awk
program is below. It substitutes some strings by nfs
(new file separator) variable and removes the prefix on lines that notifies of a kill:
BEGIN {
FS = " "
LFS = "\n"
nfs = "|"
current_game = 0
}
/Init/ { current_game++ }
/kill/ {
sub(/^[ 0-9:]+ Kill: [0-9: ]+/, "", $0)
sub(/ killed /, nfs, $0)
sub(/ by /, nfs, $0)
print $0 nfs current_game
}
- On the
BEGIN
section, declares 2 new variables:-
nfs
to separate output by something other than spaces, so next programs easily support player names with them. -
current_game
is a variable that gets incremented every time a new game starts.
-
-
/Init/
marks a new game:- Increments the variable
current_game
for the next time it gets used
- Increments the variable
- For every
/kill/
:-
sub(regex, replacement, target)
will putreplacement
into every matchingregex
ontarget
, replacingtarget
.$0
is the whole current line. -
sub(/^[ ...
removes the prefix of the line until the player name. -
sub(/ by...
andsub(/ killed...
replaces these matches bynfs
(the new field separator), allowing us to easily identify ($1
) the killer, ($2
) who got killed and ($3
) how he got killed. -
print
will print the current line ($0
) with the current game as a suffix:- As every
sub()
replaces the current line ($0
), we now have only what we needed. - As awk programs operate on lines, it is easier to have everything we need on them. That is why we add current game to every line.
- As every
-
Executing the program above, produces:
$ awk -f clean.awk qgames.log | tee qgames-clean.log
<world>|Isgalamido|MOD_TRIGGER_HURT|2
<world>|Isgalamido|MOD_TRIGGER_HURT|2
<world>|Isgalamido|MOD_TRIGGER_HURT|2
Isgalamido|Mocinha|MOD_ROCKET_SPLASH|2
Isgalamido|Isgalamido|MOD_ROCKET_SPLASH|2
Isgalamido|Isgalamido|MOD_ROCKET_SPLASH|2
<world>|Isgalamido|MOD_TRIGGER_HURT|2
<world>|Isgalamido|MOD_TRIGGER_HURT|2
<world>|Isgalamido|MOD_TRIGGER_HURT|2
<world>|Isgalamido|MOD_FALLING|2
<world>|Isgalamido|MOD_TRIGGER_HURT|2
Isgalamido|Mocinha|MOD_ROCKET|3
<world>|Zeh|MOD_TRIGGER_HURT|3
With the qgames-clean.log
file we can now easily achieve every objective of the original challenge without having to deal with:
- Unneeded context.
- Space separators. With
FS = "|"
we use|
as field separator and have:-
$1
as the killer -
$2
who got killed -
$3
how killer killed killed -
$4
in which game that happened
-
- A "checkpoint". If the log changes format, or we discover a bug, as long we produce an output conforming the current format we are good to use the next programs.
Next steps
How about you try to figure out the rest? I will post my solution and, if you learned something from this, I promise you will learn something else on the next one as well.
The Gnu awk's manual is really good - from a time technical documents were worth reading. You don't need to read everything, the index will take you where you need. Pinky promise!
I won't leave you without anything though, here is a beginning for scoreboard.awk
:
BEGIN {
FS = "|"
}
{
# Sets a player as a key in the players array
players[$1] = $1
players[$2] = $2
}
END {
# Removes <world> from players
for (name in players) {
if (name == "<world>")
continue
print name
}
}
Let me know of your solution, suggestions or doubts in the comments! ❤️
Top comments (0)