# 100 Languages Speedrun: Episode 65: Randomized Finite Automaton for Fast Thue Interpreter in Crystal

This is probably the most Computer Science heavy episode so far. If that's not your thing, feel free to skip and come back for the next episode.

Deterministic Finite Automaton is a computer science concept. It's a program with these properties:

• at every point it's in one of limited number of states
• it goes through the string one character at a time
• based on current state, current character, and nothing else, it choses the next state
• once it's done with the string, we extract some information from whichever state it ended up in

### DFA Example

So for example let's write a program that matches a string of digits. It can have some extra spaces at the beginning and end, and it can have `_` between digits (but not elsewhere, an not multiples), and it's not allowed to have leading 0s unless the whole number is zero. In regular expression terms it's `/^\s*(0|[1-9]\d*(_\d+)*)\s*\$/`.

Let's try to make a DFA for it:

• state `Start`, if character `space`, go to state `Start`:
• state `Start`, if character `1-9`, go to state `Digit`:
• state `Start`, if character `0`, go to state `End`:
• state `Digit`, if character `0-9`, go to state `Digit`
• state `Digit`, if character `space`, go to state `End`
• state `Digit`, if character `_`, go to state `Underscore`
• state `Undersor`, if character `0-9`, go to state `Digit`
• state `End`, if character `space` go to state `End`

Any state, any character not listed, go to state `Fail`.

If we reached end of the string, and state is either `Digit` or `End`, the string matches. Otherwise it doesn't. Hopefully I didn't mess it up.

### Other Finite Automata

DFAs are nice because it's obvious how to make them super fast - it's just one table lookup per character. It's also possible to combine them - OR, AND, and NOT of any number of DFAs is still a DFA (even if potentially a much bigger one).

This is all nice, but we usually want to know more than "matches" or "doesn't match" is. So we came up with so damn many variants of the DFA idea - including the engine behind regular expressions.

What we want for Thue is something like that, except:

• we want to know which rule matched
• we want to know where it matched
• we want to know exactly one of all possible matches

So I came up with Randomized Finite Automaton - a DFA variant that's perfect for Thue.

### Trie

First, let's organize all the rules into a "trie". Trie is like a tree-like structure which lets us lookup a lot of strings at once. At root of the tree is a hash with keys being all the first characters. Then at every node there's some data for strings that finish there, and more tries for strings that continue.

For example a trie for this collection of strings and heir associated data: `{"no": 1, "no": 2, "not": 3, "nu": 4}` would be:

• `root` has `data: []`, and children `{'n': trie_n}`
• `trie_n` has `data: []`, and children `{'o': trie_no, 'u': trie_nu}`
• `trie_nu` has `data: `, and children `{}`
• `trie_no` has `data: [1, 2]`, and children `{"t": trie_not}`
• `trie_not` has `data: `, and children `{}`

The goal of this is that we can have very big number of rules, and we match them all at once, instead of trying every single rule. If we have thousands of rules, this can be a lot faster than hash table based solution, since trie-based solution can just look at one character and instantly eliminate hundreds or thousands of potential matches, and only the relevant ones stay.

For example if we have this trie, and string "maybe", we do a check for `root.children['m']`, `root.children['a']`, `root.children['y']`, `root.children['b']`, `root.children['e']`, get empty 5 times, and we're done. No matter how many rules starting with `n` or whatnot we had.

I found one Trie implementation for Crystal but it wasn't doing quite what I wanted. I wanted multiple data per node, it had just one. It wouldn't be too hard to adapt, but tries are super simple so I just wrote my own implementation:

``````class Trie(T)
getter :data

def initialize
@data = [] of T
@children = Hash(Char, Trie(T)).new
end

def insert(str : String, data : T)
if str.empty?
@data.push(data)
else
c = str
@children[c] ||= Trie(T).new
@children[c].insert(str[1..-1], data)
end
end

def [](c : Char)
@children[c]?
end
end
``````

### RFA

Now there are two ways to go forward. The first (NFA style), is to remember every partially matched trie. This can potentially mean checking N tries for every character if maximum length of the rule is N.

The other would be to precalculate every combination (DFA style). In principle that would be faster as we guarantee just one lookup per character. The cost would be extra calculation, and potentially a lot bigger tries.

If we expect rules to be fairly short (let's say 10 characters or less), even if there are thousands of rules in our Thue program, then the NFA style solution is just better. If we expect rules to be very long, then DFA solution would win, but I don't think Thue programs would have very big rules.

NFA solution would also be better at ignoring fake rules - like if you use impossible rules as comments (`# this is a comment ::= it will never be matched`), NFA solution is pretty much unaffected, while DFA solution would have significantly bigger state.

So here's the Randomized Finite Automaton - the core of this episode:

``````class RFA
def initialize(@rules : Array(ThueRule))
@trie = Trie(ThueRule).new
@rules.each do |rule|
@trie.insert(rule.left, rule)
end
end

# No empty matches allowed
def random_match(str : String)
count = 0
active = [@trie]
match = nil

str.chars.each_with_index do |char, idx|
next_tries = active.map{|t| t[char]}.compact
matching_rules = next_tries.flat_map(&.data)

unless matching_rules.empty?
count += matching_rules.size
if rand(count) < matching_rules.size
rule = matching_rules.sample
match = {rule: rule, idx: (idx - rule.left.size + 1)}
end
end

active = [@trie, *next_tries]
end

match
end
end
``````

In the constructor we just insert every rule into the main trie.

As we match, we go character by character, and remember every potential trie. That's the main trie, plus any trie which we started matching already. The number of that is bound by length of the longest rule, but in principle there would be very few tries at the same time. (DFA-style solution would have only 1 trie, basically result of merging those NFA tries).

Then we go through all the tries and get all the rules matching at current character.

Now here's the fun part - we could use it to generate list of all possible matches in the string, but that's not what we want, we just want one. So we know we had N matches so far, and M matches at current character. We pick one of M at random, then we roll M / (N + M) to decide if we want to keep new or old match.

The final thing we need to adjust is subtract number of characters minus one from the match. The RFA gives us address of last matching character, but it's generally more convenient to know the first. All rules have fixed number of characters, so it's very easy.

### Complete Thue Interpreter

Here's the whole program:

``````#!/usr/bin/env crystal

require "./trie"

class ThueRule
getter :left

def initialize(@left : String, @right : String)
@right = "~\n" if @right == "~"
end

def apply(str, idx)
before = str[0, idx]
after = str[idx+@left.size .. -1]

if @right == '~'
print @right[1..-1]
replacement = ""
elsif @right == ":::"
replacement = STDIN.gets.not_nil!.chomp
else
replacement = @right
end

before + replacement + after
end

def to_s(io)
io << "Rule<#{@left.inspect}::=#{@right.inspect}>"
end
end

class ThueSideParser
getter :results
@toparse : Array(Char)

def initialize(@str : String)
@results = [""]
@toparse = @str.chars
parse
end

private def parse
until @toparse.empty?
case @toparse
when '['
chars = parse_range
if @results.size == 1
@results = chars.map{|c| @results+c}
elsif @results.size == chars.size
@results = @results.zip(chars).map{|s,c| s+c}
else
raise "Sizes of character classes mismatch in #{@str}"
end
else
c = parse_character
@results = @results.map{|s| s + c}
end
end
@results
end

private def parse_character
if @toparse == '\\'
@toparse.shift
raise "Unmatched \\ in #{@str}" if eos?
c = @toparse.shift
case c
when 'n'
'\n'
when 's'
' '
else
c
end
else
@toparse.shift
end
end

private def parse_range
chars = [] of Char
@toparse.shift
loop do
raise "Character range never closed in #{@str}" if eos?
if @toparse == ']'
@toparse.shift
return chars
end
c = parse_character
raise "Character range never closed in #{@str}" if eos?
if @toparse == '-'
@toparse.shift
e = parse_character
raise "Invalid character range in #{@str}" if e < c
chars.concat(c..e)
else
chars << c
end
end
end

private def eos?
@toparse.empty?
end
end

class ThueRuleParser
def initialize(@str : String)
if @str =~ /\A(.*)::=(.*)\z/
@valid = true
@left = \$1
@right = \$2
else
@left = ""
@right = ""
@valid = false
end
end

def valid_rule?
@valid
end

def empty_rule?
@valid && @left.empty?
end

def call
lefts = ThueSideParser.new(@left).results
rights = ThueSideParser.new(@right).results

# Support N-to-1 and 1-to-N rules
lefts *= rights.size if lefts.size == 1
rights *= lefts.size if rights.size == 1

unless lefts.size == rights.size
raise "Mismatched side of rule #{@str}"
end

lefts.zip(rights).map do |left, right|
ThueRule.new(left, right)
end
end
end

class RFA
def initialize(@rules : Array(ThueRule))
@trie = Trie(ThueRule).new
@rules.each do |rule|
@trie.insert(rule.left, rule)
end
end

# No empty matches allowed
def random_match(str : String)
count = 0
active = [@trie]
match = nil

str.chars.each_with_index do |char, idx|
next_tries = active.map{|t| t[char]}.compact
matching_rules = next_tries.flat_map(&.data)

unless matching_rules.empty?
count += matching_rules.size
if rand(count) < matching_rules.size
rule = matching_rules.sample
match = {rule: rule, idx: (idx - rule.left.size + 1)}
end
end

active = [@trie, *next_tries]
end

match
end
end

class ThueProgram
def initialize
@rules = [] of ThueRule
@initial = ""
@state = ""
end

while lines.size > 0
line, line_no = lines.shift
# Ignoring invalid rules, they are sometimes used as comments
parser = ThueRuleParser.new(line)
next unless parser.valid_rule?
break if parser.empty_rule?
@rules.concat parser.call
end

@rfa = RFA.new(@rules)

@initial = lines.map(&.first).join("\n")
end

def run(debug)
@state = @initial
if debug
@rules.each do |rule|
STDERR.puts rule
end
end

while match = @rfa.not_nil!.random_match(@state)
rule = match[:rule]
idx = match[:idx]
if debug
STDERR.puts "Applying rule #{rule} at #{idx} to #{@state.inspect}"
end
@state = rule.apply(@state, idx)
end

if debug
STDERR.puts "No more matches. Final state: #{@state.inspect}"
end
end
end

unless ARGV.size == 1
STDERR.puts "Usage: #{PROGRAM_NAME} <file.thue>"
exit 1
end

prog = ThueProgram.new

# Crystal doesn't handle SIGPIPE well and we want to support:
# crystal thue.cr examples/fizzbuzz.thue | head -n 100
begin
prog.run(!!ENV["DEBUG"]?)
rescue e : IO::Error
exit if e.os_error == Errno::EPIPE
raise e
end
``````

### Performance

Doing just this change, we got decent performance improvement, 51s to 21s on 100k FizzBuzz Thue program:

``````\$ time ./thue_rx.cr examples_rx/fizzbuzz.thue | head -n 100000 >/dev/null
./thue_rx.cr examples_rx/fizzbuzz.thue  51.50s user 0.81s system 101% cpu 51.601 total
head -n 100000 > /dev/null  0.16s user 0.39s system 1% cpu 51.590 total

\$ time ./thue_rfa.cr examples_rx/fizzbuzz.thue | head -n 100000 >/dev/null
./thue_rfa.cr examples_rx/fizzbuzz.thue  21.47s user 13.90s system 165% cpu 21.418 total
head -n 100000 > /dev/null  0.11s user 0.21s system 1% cpu 21.408 total
``````

Comparing release builds, the difference is consistent but small, 41s to 39s on 500k FizzBuzz Thue program. Both finish 100k FizzBuzz in ~7s:

``````\$ crystal build --release thue_rfa.cr
\$ crystal build --release thue_rx.cr
\$ time ./thue_rx examples_rx/fizzbuzz.thue | head -n 500000 >/dev/null
./thue_rx examples_rx/fizzbuzz.thue  41.05s user 3.38s system 106% cpu 41.762 total
head -n 500000 > /dev/null  0.45s user 0.88s system 3% cpu 41.760 total
\$ time ./thue_rfa examples_rx/fizzbuzz.thue | head -n 500000 >/dev/null
./thue_rfa examples_rx/fizzbuzz.thue  39.44s user 64.53s system 272% cpu 38.119 total
head -n 500000 > /dev/null  0.52s user 0.95s system 3% cpu 38.117 total
``````

I'm not sure if this counts as a win or not. It's very big improvement on the development build, but small one on the release build. It's definitely going to be more significant when running Thue programs with a huge number of rules, I guess FizzBuzz with less than 100 rules didn't really benefit from that.

There's probably a lot of small optimizations that can be applied to `RFA#random_match`, even without precomputing a single big trie.

### Code Really nice post, as usual!

Some small things to further improve the performance:

• Use `str.each_char_with_index` instead of `str.chars.each_with_index` (the former doesn't allocate memory while the latter does)
• Use `active.compact_map { |t| t[char] }` instead of `active.map { |t| t[char] }.compact` (one less intermediate array)
• Do something like `active.clear; active.push @trie; active.concat next_tries` instead of creating a new array for each char

That said, I totally understand that the goal is to improve performance compared to before while also keeping the code as readable as possible. I'm just suggesting these because you said "There's probably a lot small optimizations" and because I like optimizing things :-) Tomasz Wegrzanowski

Oh for sure. I think the biggest performance savings would come from switching from processing characters to processing bytes, as in UTF-8 this works perfectly well without any changes, and then we could just use 256 entry arrays instead of hashes for the trie.

Not like this really makes much difference for programs with so few rules.

DEV Community