Tomasz Wegrzanowski

Posted on Jan 22, 2022

100 Languages Speedrun: Episode 65: Randomized Finite Automaton for Fast Thue Interpreter in Crystal

#crystal #ruby #thue

This is probably the most Computer Science heavy episode so far. If that's not your thing, feel free to skip and come back for the next episode.

Deterministic Finite Automaton is a computer science concept. It's a program with these properties:

at every point it's in one of limited number of states
it goes through the string one character at a time
based on current state, current character, and nothing else, it choses the next state
once it's done with the string, we extract some information from whichever state it ended up in

DFA Example

So for example let's write a program that matches a string of digits. It can have some extra spaces at the beginning and end, and it can have _ between digits (but not elsewhere, an not multiples), and it's not allowed to have leading 0s unless the whole number is zero. In regular expression terms it's /^\s*(0|[1-9]\d*(_\d+)*)\s*$/.

Let's try to make a DFA for it:

state Start, if character space, go to state Start:
state Start, if character 1-9, go to state Digit:
state Start, if character 0, go to state End:
state Digit, if character 0-9, go to state Digit
state Digit, if character space, go to state End
state Digit, if character _, go to state Underscore
state Undersor, if character 0-9, go to state Digit
state End, if character space go to state End

Any state, any character not listed, go to state Fail.

If we reached end of the string, and state is either Digit or End, the string matches. Otherwise it doesn't. Hopefully I didn't mess it up.

Other Finite Automata

DFAs are nice because it's obvious how to make them super fast - it's just one table lookup per character. It's also possible to combine them - OR, AND, and NOT of any number of DFAs is still a DFA (even if potentially a much bigger one).

This is all nice, but we usually want to know more than "matches" or "doesn't match" is. So we came up with so damn many variants of the DFA idea - including the engine behind regular expressions.

What we want for Thue is something like that, except:

we want to know which rule matched
we want to know where it matched
we want to know exactly one of all possible matches

So I came up with Randomized Finite Automaton - a DFA variant that's perfect for Thue.

Trie

First, let's organize all the rules into a "trie". Trie is like a tree-like structure which lets us lookup a lot of strings at once. At root of the tree is a hash with keys being all the first characters. Then at every node there's some data for strings that finish there, and more tries for strings that continue.

For example a trie for this collection of strings and heir associated data: {"no": 1, "no": 2, "not": 3, "nu": 4} would be:

root has data: [], and children {'n': trie_n}
trie_n has data: [], and children {'o': trie_no, 'u': trie_nu}
trie_nu has data: [4], and children {}
trie_no has data: [1, 2], and children {"t": trie_not}
trie_not has data: [3], and children {}

The goal of this is that we can have very big number of rules, and we match them all at once, instead of trying every single rule. If we have thousands of rules, this can be a lot faster than hash table based solution, since trie-based solution can just look at one character and instantly eliminate hundreds or thousands of potential matches, and only the relevant ones stay.

For example if we have this trie, and string "maybe", we do a check for root.children['m'], root.children['a'], root.children['y'], root.children['b'], root.children['e'], get empty 5 times, and we're done. No matter how many rules starting with n or whatnot we had.

I found one Trie implementation for Crystal but it wasn't doing quite what I wanted. I wanted multiple data per node, it had just one. It wouldn't be too hard to adapt, but tries are super simple so I just wrote my own implementation:

class Trie(T)
  getter :data

  def initialize
    @data = [] of T
    @children = Hash(Char, Trie(T)).new
  end

  def insert(str : String, data : T)
    if str.empty?
      @data.push(data)
    else
      c = str[0]
      @children[c] ||= Trie(T).new
      @children[c].insert(str[1..-1], data)
    end
  end

  def [](c : Char)
    @children[c]?
  end
end

RFA

Now there are two ways to go forward. The first (NFA style), is to remember every partially matched trie. This can potentially mean checking N tries for every character if maximum length of the rule is N.

The other would be to precalculate every combination (DFA style). In principle that would be faster as we guarantee just one lookup per character. The cost would be extra calculation, and potentially a lot bigger tries.

If we expect rules to be fairly short (let's say 10 characters or less), even if there are thousands of rules in our Thue program, then the NFA style solution is just better. If we expect rules to be very long, then DFA solution would win, but I don't think Thue programs would have very big rules.

NFA solution would also be better at ignoring fake rules - like if you use impossible rules as comments (# this is a comment ::= it will never be matched), NFA solution is pretty much unaffected, while DFA solution would have significantly bigger state.

So here's the Randomized Finite Automaton - the core of this episode:

class RFA
  def initialize(@rules : Array(ThueRule))
    @trie = Trie(ThueRule).new
    @rules.each do |rule|
      @trie.insert(rule.left, rule)
    end
  end

  # No empty matches allowed
  def random_match(str : String)
    count = 0
    active = [@trie]
    match = nil

    str.chars.each_with_index do |char, idx|
      next_tries = active.map{|t| t[char]}.compact
      matching_rules = next_tries.flat_map(&.data)

      unless matching_rules.empty?
        count += matching_rules.size
        if rand(count) < matching_rules.size
          rule = matching_rules.sample
          match = {rule: rule, idx: (idx - rule.left.size + 1)}
        end
      end

      active = [@trie, *next_tries]
    end

    match
  end
end

In the constructor we just insert every rule into the main trie.

As we match, we go character by character, and remember every potential trie. That's the main trie, plus any trie which we started matching already. The number of that is bound by length of the longest rule, but in principle there would be very few tries at the same time. (DFA-style solution would have only 1 trie, basically result of merging those NFA tries).

Then we go through all the tries and get all the rules matching at current character.

Now here's the fun part - we could use it to generate list of all possible matches in the string, but that's not what we want, we just want one. So we know we had N matches so far, and M matches at current character. We pick one of M at random, then we roll M / (N + M) to decide if we want to keep new or old match.

The final thing we need to adjust is subtract number of characters minus one from the match. The RFA gives us address of last matching character, but it's generally more convenient to know the first. All rules have fixed number of characters, so it's very easy.

Complete Thue Interpreter

Here's the whole program:

#!/usr/bin/env crystal

require "./trie"

class ThueRule
  getter :left

  def initialize(@left : String, @right : String)
    @right = "~\n" if @right == "~"
  end

  def apply(str, idx)
    before = str[0, idx]
    after = str[idx+@left.size .. -1]

    if @right[0] == '~'
      print @right[1..-1]
      replacement = ""
    elsif @right == ":::"
      replacement = STDIN.gets.not_nil!.chomp
    else
      replacement = @right
    end

    before + replacement + after
  end

  def to_s(io)
    io << "Rule<#{@left.inspect}::=#{@right.inspect}>"
  end
end

class ThueSideParser
  getter :results
  @toparse : Array(Char)

  def initialize(@str : String)
    @results = [""]
    @toparse = @str.chars
    parse
  end

  private def parse
    until @toparse.empty?
      case @toparse[0]
      when '['
        chars = parse_range
        if @results.size == 1
          @results = chars.map{|c| @results[0]+c}
        elsif @results.size == chars.size
          @results = @results.zip(chars).map{|s,c| s+c}
        else
          raise "Sizes of character classes mismatch in #{@str}"
        end
      else
        c = parse_character
        @results = @results.map{|s| s + c}
      end
    end
    @results
  end

  private def parse_character
    if @toparse[0] == '\\'
      @toparse.shift
      raise "Unmatched \\ in #{@str}" if eos?
      c = @toparse.shift
      case c
      when 'n'
        '\n'
      when 's'
        ' '
      else
        c
      end
    else
      @toparse.shift
    end
  end

  private def parse_range
    chars = [] of Char
    @toparse.shift
    loop do
      raise "Character range never closed in #{@str}" if eos?
      if @toparse[0] == ']'
        @toparse.shift
        return chars
      end
      c = parse_character
      raise "Character range never closed in #{@str}" if eos?
      if @toparse[0] == '-'
        @toparse.shift
        e = parse_character
        raise "Invalid character range in #{@str}" if e < c
        chars.concat(c..e)
      else
        chars << c
      end
    end
  end

  private def eos?
    @toparse.empty?
  end
end

class ThueRuleParser
  def initialize(@str : String)
    if @str =~ /\A(.*)::=(.*)\z/
      @valid = true
      @left = $1
      @right = $2
    else
      @left = ""
      @right = ""
      @valid = false
    end
  end

  def valid_rule?
    @valid
  end

  def empty_rule?
    @valid && @left.empty?
  end

  def call
    lefts = ThueSideParser.new(@left).results
    rights = ThueSideParser.new(@right).results

    # Support N-to-1 and 1-to-N rules
    lefts *= rights.size if lefts.size == 1
    rights *= lefts.size if rights.size == 1

    unless lefts.size == rights.size
      raise "Mismatched side of rule #{@str}"
    end

    lefts.zip(rights).map do |left, right|
      ThueRule.new(left, right)
    end
  end
end

class RFA
  def initialize(@rules : Array(ThueRule))
    @trie = Trie(ThueRule).new
    @rules.each do |rule|
      @trie.insert(rule.left, rule)
    end
  end

  # No empty matches allowed
  def random_match(str : String)
    count = 0
    active = [@trie]
    match = nil

    str.chars.each_with_index do |char, idx|
      next_tries = active.map{|t| t[char]}.compact
      matching_rules = next_tries.flat_map(&.data)

      unless matching_rules.empty?
        count += matching_rules.size
        if rand(count) < matching_rules.size
          rule = matching_rules.sample
          match = {rule: rule, idx: (idx - rule.left.size + 1)}
        end
      end

      active = [@trie, *next_tries]
    end

    match
  end
end

class ThueProgram
  def initialize
    @rules = [] of ThueRule
    @initial = ""
    @state = ""
  end

  def load(path)
    lines = File.read_lines(path).map(&.chomp).zip(1..)

    while lines.size > 0
      line, line_no = lines.shift
      # Ignoring invalid rules, they are sometimes used as comments
      parser = ThueRuleParser.new(line)
      next unless parser.valid_rule?
      break if parser.empty_rule?
      @rules.concat parser.call
    end

    @rfa = RFA.new(@rules)

    @initial = lines.map(&.first).join("\n")
  end

  def run(debug)
    @state = @initial
    if debug
      @rules.each do |rule|
        STDERR.puts rule
      end
    end

    while match = @rfa.not_nil!.random_match(@state)
      rule = match[:rule]
      idx = match[:idx]
      if debug
        STDERR.puts "Applying rule #{rule} at #{idx} to #{@state.inspect}"
      end
      @state = rule.apply(@state, idx)
    end

    if debug
      STDERR.puts "No more matches. Final state: #{@state.inspect}"
    end
  end
end

unless ARGV.size == 1
  STDERR.puts "Usage: #{PROGRAM_NAME} <file.thue>"
  exit 1
end

prog = ThueProgram.new
prog.load(ARGV[0])

# Crystal doesn't handle SIGPIPE well and we want to support:
# crystal thue.cr examples/fizzbuzz.thue | head -n 100
begin
  prog.run(!!ENV["DEBUG"]?)
rescue e : IO::Error
  exit if e.os_error == Errno::EPIPE
  raise e
end

Performance

Doing just this change, we got decent performance improvement, 51s to 21s on 100k FizzBuzz Thue program:

$ time ./thue_rx.cr examples_rx/fizzbuzz.thue | head -n 100000 >/dev/null
./thue_rx.cr examples_rx/fizzbuzz.thue  51.50s user 0.81s system 101% cpu 51.601 total
head -n 100000 > /dev/null  0.16s user 0.39s system 1% cpu 51.590 total

$ time ./thue_rfa.cr examples_rx/fizzbuzz.thue | head -n 100000 >/dev/null
./thue_rfa.cr examples_rx/fizzbuzz.thue  21.47s user 13.90s system 165% cpu 21.418 total
head -n 100000 > /dev/null  0.11s user 0.21s system 1% cpu 21.408 total

Comparing release builds, the difference is consistent but small, 41s to 39s on 500k FizzBuzz Thue program. Both finish 100k FizzBuzz in ~7s:

$ crystal build --release thue_rfa.cr
$ crystal build --release thue_rx.cr
$ time ./thue_rx examples_rx/fizzbuzz.thue | head -n 500000 >/dev/null
./thue_rx examples_rx/fizzbuzz.thue  41.05s user 3.38s system 106% cpu 41.762 total
head -n 500000 > /dev/null  0.45s user 0.88s system 3% cpu 41.760 total
$ time ./thue_rfa examples_rx/fizzbuzz.thue | head -n 500000 >/dev/null
./thue_rfa examples_rx/fizzbuzz.thue  39.44s user 64.53s system 272% cpu 38.119 total
head -n 500000 > /dev/null  0.52s user 0.95s system 3% cpu 38.117 total

I'm not sure if this counts as a win or not. It's very big improvement on the development build, but small one on the release build. It's definitely going to be more significant when running Thue programs with a huge number of rules, I guess FizzBuzz with less than 100 rules didn't really benefit from that.

There's probably a lot of small optimizations that can be applied to RFA#random_match, even without precomputing a single big trie.

Code

All code examples for the series will be in this repository.

Code for the Better Thue Interpreter in Crystal episode is available here.

Oldest comments (2)

Ary Borenszweig • Jan 22 '22

Really nice post, as usual!

Some small things to further improve the performance:

Use str.each_char_with_index instead of str.chars.each_with_index (the former doesn't allocate memory while the latter does)
Use active.compact_map { |t| t[char] } instead of active.map { |t| t[char] }.compact (one less intermediate array)
Do something like active.clear; active.push @trie; active.concat next_tries instead of creating a new array for each char

That said, I totally understand that the goal is to improve performance compared to before while also keeping the code as readable as possible. I'm just suggesting these because you said "There's probably a lot small optimizations" and because I like optimizing things :-)

Tomasz Wegrzanowski • Jan 22 '22

Oh for sure. I think the biggest performance savings would come from switching from processing characters to processing bytes, as in UTF-8 this works perfectly well without any changes, and then we could just use 256 entry arrays instead of hashes for the trie.

Not like this really makes much difference for programs with so few rules.