I've only done the parsing so far but it warrants a look all on its own. Reading the description I feel fully vindicated in my advocacy of parser combinators all month. I've been tempted a couple of times to build a full-blown lexer/parser combination but have gotten by so far with simple character parsers. Today let's do the full thing.
The difference is that the lexer stage differentiates interesting parts of the input from delimiters and unwanted information. In a real language you often filter out comments here. The result is a sequence of tokens rather than characters, and you write the parser in terms of a grammar of tokens.
First a simple model. We'll start with some type aliases. I don't like to see native or primitive types in real data models. The structure is much as the input format.
The lexer matches all interesting words and symbols in the input and ignores the rest. This is really useful as there are irregular line splits and other whitespace differences in the text. The lexer gets rid of all that.
I like to add a helper function for matching tokens in the parser stage. And since we have runs of tokens in the input I'm going to make an extra one which breaks a string into tokens and matches them in sequence. This allows me to write a parser like tokens("these words with any whitespace or line breaks") in the parser, and it'll do just what that says.
The parser is slightly different from before. We're no longer consuming characters but tokens produced by the lexer. e.g. the word damage is a single token now, not six separate characters to be matched in sequence.
Terminals.IntegerLiteral.TOKENIZER is one kind of token. Its corresponding parser is:
Let's start in the middle of the parser, at army units. The text says things like "17 units each with", and I'm going to define the rest after that as the unit description. So it's just the sequence of things in the descripton. The major complication is the weaknessess and immunities group in parenthesis, so I factored that out.
valunit:Parser<ArmyUnit>=sequence(integer,tokens("hit points").next(weaknessesAndImmunitiesOrEmpty),tokens("with an attack that does").next(attack),tokens("at initiative").next(integer),{hitPoints,wi,attack,initiative->ArmyUnit(hitPoints,attack,initiative,wi.weaknesses.toSet(),wi.immunities.toSet())})
Looking at the input text, the weaknesses and immunities can be in either order, or one can be missing, or the whole group might be ommitted. Good luck dealing with that with your hand-rolled parsers, but it's easy stuff for combinators. There are a few ways you could do it - the "weak to" and "immune to" tokens could qualify the following lists, then these would be sorted into their appropriate places when building the overall structure. I chose to keep the data structure simple and handle the various cases at the parser level, since there are only a few:
data classWeaknessesAndImmunities(valweaknesses:Collection<AttackType>,valimmunities:Collection<AttackType>)valcommaList=Terminals.Identifier.PARSER.sepBy1(token(","))valweaknesses=tokens("weak to").next(commaList)valimmunities=tokens("immune to").next(commaList)valweaknessesAndImmunities:Parser<WeaknessesAndImmunities>=or(sequence(weaknesses,token(";").next(immunities),{w,i->WeaknessesAndImmunities(w,i)}),sequence(immunities,token(";").next(weaknesses),{i,w->WeaknessesAndImmunities(w,i)}),weaknesses.map{w->WeaknessesAndImmunities(w,setOf())},immunities.map{i->WeaknessesAndImmunities(setOf(),i)}).between(token("("),token(")"))
There's a lot going on here. Terminals.Identifier is one of our token types - its a word not predefined by the grammar, such as a variable name in a programming language. A commaList is one or more of these words separated by comma tokens. Remember the whitespace is dealt with in the lexer.
Weaknesses and immunities are just the identifying prefixes followed by comma lists.
The overall structure has four options:
Weaknesses followed by a semicolon then immunites
Immunities followed by a semicolon then weaknesses
Just weaknesses
Just immunities
In each case we build the data structure out of the parts available, using an empty set for anything missing. And finally at the very end the parser is augmented with between(token("("), token(")")) which matches (and skips) the parenthesis surrounding the group.
.optional() turns a parser into one which matches its grammar, or nothing at all. If nothing is matched we get a null result, which is easily replaced with an empty data structure.
Building up the rest of the data structure is relatively easy, just more combinations of parsers all the way up to the full grammar:
valattack:Parser<Attack>=sequence(integer,Terminals.Identifier.PARSER.followedBy(token("damage")),::Attack)valgroup:Parser<ArmyGroup>=sequence(integer,tokens("units each with").next(unit),::ArmyGroup)funbattleSide(name:String):Parser<List<ArmyGroup>>=tokens(name).next(token(":")).next(group.many1())valbattle:Parser<Battle>=sequence(battleSide("immune system"),battleSide("infection"),{a,b->Battle(a.toSet(),b.toSet())})
It's another iterative simulation with a bunch of details to implement. I chose to do it completely functionally with immutable data, with the overall battle state updated on each step and threaded through the entire calculation. As before I simulated the whole battle as a sequence of states, and the part 1 result can be obtained from the last in this sequence:
The fight() function must therefore compute the next state for the battle. First it does target selection, the result of which is a map of attacking groups IDs to defending groups IDs. I used IDs as the data objects are immutable and will be getting replaced with modified versions as the battle proceeds.
Target selection needs a set of unselected groups and the set of selections to be threaded through the computation so I wrapped both in a container data structure and gave it a helper function for adding new selections:
Once we've done the target selection we run the fight. We take the target selection, order the attackers by initiative and put them back into a (now ordered) list of attacker, defender pairs:
Running the attacks is pretty simple but we have to be careful to use the current state at each step. groups is the set of Army Groups at the beginning of the fight round, groups_ is the set at the current point in the attack sequence. One nice aspect of this style of design is that by swapping those around it's trivial to change the behaviour from sequential to parallel (i.e. all attacks happen at once), should that requirement change.
orderedAttackIds.fold(groups){groups_,(attackingGroupId,defendingGroupId)->valattackingGroup=groups_.find(attackingGroupId)valdefendingGroup=groups_.find(defendingGroupId)if(attackingGroup==null||defendingGroup==null)// might be eliminatedgroups_else{valotherGroups=groups_.filterNot{g->g.id==defendingGroupId}.toSet()valdamage=defendingGroup.damageFrom(attackingGroup.unit.attack)*attackingGroup.countvalunitsKilled=minOf(damage/defendingGroup.unit.hitPoints,defendingGroup.count)if(unitsKilled==defendingGroup.count)otherGroupselseotherGroups+(defendingGroup-unitsKilled)}}
Find the minimum boost that lets the reindeer win? Sounds like a job for a binary search to me. Let's write a binary search that finds the minimum in a range that satisfies a predicate:
One thing that burned me was the example data is missing a LOT of edge cases the real data will have. For example, the real data:
has opposing groups where one is immune to the other's attacks
can actually result in deadlocks if the last two remained groups are immune to each other, or if their damage/num units combination isn't enough to kill one opposing unit.
Yes I find that annoying. But I guess it's like the real world too, and the answer is almost always in the detail of the text description, even if it takes several reads to jump out at you.
Yeah, a parser generator was definitely the right way to do it. I was stubborn and stuck to the naive way of doing it, which only worked because I noticed the only variable part was the special weaknesses/immunities bit. So I yanked that out:
and then processed the simple tokens the simple way:
units=parse(Int,tokens[1])# this is the easy onehealth=parse(Int,tokens[5])damage=parse(Int,tokens[13])attack_type=tokens[14]initiative=parse(Int,tokens[end])
Assigning teams was done in a parent method that knew about the entire input:
function parse_input(input)groups::Array{Group}=[]team="Immune System"number=1forlineininputifline==""continueendifline=="Immune System:"continueendifline=="Infection:"team="Infection"number=1continueendg=parse_group(line)g.number=numberg.team=teampush!(groups,g)number+=1endreturngroupsend
I also assigned a group number so I could get similar output to the sample output.
I got it in the end. I missed the sentence "If it cannot deal any defending groups damage, it does not choose a target". A simple extra clause in the target selection:
I've only done the parsing so far but it warrants a look all on its own. Reading the description I feel fully vindicated in my advocacy of parser combinators all month. I've been tempted a couple of times to build a full-blown lexer/parser combination but have gotten by so far with simple character parsers. Today let's do the full thing.
The difference is that the lexer stage differentiates interesting parts of the input from delimiters and unwanted information. In a real language you often filter out comments here. The result is a sequence of tokens rather than characters, and you write the parser in terms of a grammar of tokens.
First a simple model. We'll start with some type aliases. I don't like to see native or primitive types in real data models. The structure is much as the input format.
Lexer
The lexer matches all interesting words and symbols in the input and ignores the rest. This is really useful as there are irregular line splits and other whitespace differences in the text. The lexer gets rid of all that.
I like to add a helper function for matching tokens in the parser stage. And since we have runs of tokens in the input I'm going to make an extra one which breaks a string into tokens and matches them in sequence. This allows me to write a parser like
tokens("these words with any whitespace or line breaks")
in the parser, and it'll do just what that says.Parser
The parser is slightly different from before. We're no longer consuming characters but tokens produced by the lexer. e.g. the word
damage
is a single token now, not six separate characters to be matched in sequence.Terminals.IntegerLiteral.TOKENIZER
is one kind of token. Its corresponding parser is:Let's start in the middle of the parser, at army units. The text says things like "17 units each with", and I'm going to define the rest after that as the unit description. So it's just the sequence of things in the descripton. The major complication is the weaknessess and immunities group in parenthesis, so I factored that out.
Looking at the input text, the weaknesses and immunities can be in either order, or one can be missing, or the whole group might be ommitted. Good luck dealing with that with your hand-rolled parsers, but it's easy stuff for combinators. There are a few ways you could do it - the "weak to" and "immune to" tokens could qualify the following lists, then these would be sorted into their appropriate places when building the overall structure. I chose to keep the data structure simple and handle the various cases at the parser level, since there are only a few:
There's a lot going on here.
Terminals.Identifier
is one of our token types - its a word not predefined by the grammar, such as a variable name in a programming language. AcommaList
is one or more of these words separated by comma tokens. Remember the whitespace is dealt with in the lexer.Weaknesses and immunities are just the identifying prefixes followed by comma lists.
The overall structure has four options:
In each case we build the data structure out of the parts available, using an empty set for anything missing. And finally at the very end the parser is augmented with
between(token("("), token(")"))
which matches (and skips) the parenthesis surrounding the group.What if the whole section is omitted however?
.optional()
turns a parser into one which matches its grammar, or nothing at all. If nothing is matched we get a null result, which is easily replaced with an empty data structure.Building up the rest of the data structure is relatively easy, just more combinations of parsers all the way up to the full grammar:
Parsing the example input gives me:
That was fun. I'll be back later for the battle simulation!
Part 1
It's another iterative simulation with a bunch of details to implement. I chose to do it completely functionally with immutable data, with the overall battle state updated on each step and threaded through the entire calculation. As before I simulated the whole battle as a sequence of states, and the part 1 result can be obtained from the last in this sequence:
The
fight()
function must therefore compute the next state for the battle. First it does target selection, the result of which is a map of attacking groups IDs to defending groups IDs. I used IDs as the data objects are immutable and will be getting replaced with modified versions as the battle proceeds.Target selection needs a set of unselected groups and the set of selections to be threaded through the computation so I wrapped both in a container data structure and gave it a helper function for adding new selections:
Selecting targets and fighting involves a few sorts and orderings so I used Kotlin Comparators again:
These depends on a couple of simple helpers which capture the business logic quite neatly:
Building the target selection now becomes:
Once we've done the target selection we run the fight. We take the target selection, order the attackers by initiative and put them back into a (now ordered) list of attacker, defender pairs:
Running the attacks is pretty simple but we have to be careful to use the current state at each step.
groups
is the set of Army Groups at the beginning of the fight round,groups_
is the set at the current point in the attack sequence. One nice aspect of this style of design is that by swapping those around it's trivial to change the behaviour from sequential to parallel (i.e. all attacks happen at once), should that requirement change.Part 2
Find the minimum boost that lets the reindeer win? Sounds like a job for a binary search to me. Let's write a binary search that finds the minimum in a range that satisfies a predicate:
We'll need some boosting helpers:
A predicate to test a candidate, for use with the search:
And the answer becomes a matter of finding the minimum boost and running part 1 again with that boost:
And yet again, I have a solution that works for the example and not for the real data. Probably the most frustrating aspect of Advent of Code.
One thing that burned me was the example data is missing a LOT of edge cases the real data will have. For example, the real data:
Yes I find that annoying. But I guess it's like the real world too, and the answer is almost always in the detail of the text description, even if it takes several reads to jump out at you.
Yeah, a parser generator was definitely the right way to do it. I was stubborn and stuck to the naive way of doing it, which only worked because I noticed the only variable part was the special weaknesses/immunities bit. So I yanked that out:
and then processed the simple tokens the simple way:
and then processed the special tokens separately:
by the end I kinda ran out of good variable names was just using
bits
as a name, heh.And then at the end, I could just return a new
Group
object...the first arg is the team, and the second arg is the number, more on that belowAssigning teams was done in a parent method that knew about the entire input:
I also assigned a group number so I could get similar output to the sample output.
I got it in the end. I missed the sentence "If it cannot deal any defending groups damage, it does not choose a target". A simple extra clause in the target selection:
became