DEV Community

Rudolf Olah
Rudolf Olah

Posted on

The Power of Tries, Data Structure Optimization in Emacs

The following article, originally published in 2009, was a combination of both prose and code, written in Emacs with a tool called "noweb". I have updated it to not use the special syntax of TeX and noweb. Noweb is a "literate programming" tool, where the usual interplay between comments and code is changed into a complex layering of code and explanations and deep-dives. With the noweb tool, you can export the code into an assembled ordering. This prose/code text can be run through the noweb tool and produce an emacs lisp file. The code that's exported was in blocks that begin with <<>>= and end with @. This updated version includes the assembled code at the end. Zencoding was a way to rapidly generate HTML using CSS selectors.

zencoding-trie

To improve the performance of zencoding-mode, caching/memoization of previously entered input needed to be done. At first, a naive hash-table implementation was chosen and this greatly cut down the speed with which zencoding generated HTML. Then it was realized that a trie data structure would be more efficient as the hash-table was storing partially entered input as well. A trie could store the partially entered data when needed and use less space than a hash-table.

A trie is a tree where each node represents a character in a string. When you reach a leaf node, a word or sentence is spelled out. Each node can contain a value.

Important: the key is not stored in the node, but in the parent of that node.

The only operations we require for our trie are retrieve value and insert value.

A node is a sequence of two elements. The first element is some value, in our case the cached version of zencoding's AST (Abstract Syntax Tree) output. If this is NIL then there is no value stored in this node. The second element is a sequence of branches (more nodes).

(defun make-zencoding-trie-node ()
  "Creates a vector of two elements. The first element is some value, the second element is a character table for storing more branches of the trie. The value is initially NIL."
  (vector nil (make-char-table 'trie-table)))
Enter fullscreen mode Exit fullscreen mode

We need some functions for setting and getting the values of our node type. All of the functions accept a node as an argument which must be a sequence type with at least two elements. The value argument can be anything, typically something non-NIL. The branch-key argument must be a character.

(defun zencoding-trie-node-value (node)
  (aref node 0))

(defun zencoding-trie-node-set-value (node value)
  (aset node 0 value))

(defun zencoding-trie-node-branches (node)
  (aref node 1))

(defun zencoding-trie-node-branch (node branch-key)
  (aref (zencoding-trie-node-branches node) branch-key)) 
Enter fullscreen mode Exit fullscreen mode

We need a function to access a particular branch in a node. If the branch does not exist, it will be created.

(defun zencoding-trie-node-create-branch (node branch-key)
  (aset (zencoding-trie-node-branches node)
    branch-key
    (make-zencoding-trie-node)))

(defun zencoding-trie-node-brancher (node branch-key)
  (let ((branch (aref (zencoding-trie-node-branches node) branch-key)))
    (if branch
    branch
      ;; branch doesn't exist, so create it and then return it.
      (zencoding-trie-node-create-branch node branch-key))))
Enter fullscreen mode Exit fullscreen mode

The retrieval function is given a string s to search for and the trie to search within. Where n is the length of the string s, the argument i is bounded like so: 0 < i <= n.

We are moving forward through a string till we reach its end. The current character is s[i - 1]. If the current node we are visiting in the trie contains the current character as one of its branches, then we have to visit it.

When the end of s has been reached, i = n, then we have reached the correct node and can return it (it will be created if it doesn't already exist). Before this, we will have to keep on searching for the right node.

If we reach the end of the trie before the correct node has been found, we return NIL.

The traverse function is a generalized version which includes a function argument, f, telling us how to select the next branch to visit.

(defun zencoding-trie-traverse (s trie i f)
  (if (null trie)
      nil
    (let ((branch (funcall f trie (aref s (1- i)))))
      (if (or (null branch) (= i (length s)))
      branch
    (zencoding-trie-traverse s branch (1+ i) f)))))

(defun zencoding-trie-retrieve (s trie)
  (zencoding-trie-traverse s trie 1 'zencoding-trie-node-branch))
Enter fullscreen mode Exit fullscreen mode

Next we have a function for inserting a string and its value into a trie. The pre-conditions are that s is a type of string, value and trie are non-NIL.

We use the previously defined retrieve operation to find the final node and then insert the value there. Fortunately for us, the retrieve operation takes care of creating new branches if necessary.

(defun zencoding-trie-insert (s value trie)
  "`s' is the string used to navigate the trie (each character is a node in the trie). `value' is the value we want to insert. `trie' is the node where we start the insertion."
  (zencoding-trie-node-set-value
   (zencoding-trie-traverse s trie 1 'zencoding-trie-node-brancher)
   value))
Enter fullscreen mode Exit fullscreen mode

The final file looks like this:

;;; zencoding-trie.el --- A trie data structure built for zencoding-mode
;;
;; Copyright (C) 2009, Rudolf Olah
;;
;; Author: Rudolf Olah
;;
;; This file is free software; you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation; either version 3, or (at your option)
;; any later version.
;;
;; This file is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License
;; along with GNU Emacs; see the file COPYING.  If not, write to
;; the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
;; Boston, MA 02110-1301, USA.
;;

(defun make-zencoding-trie-node ()
  "Creates a vector of two elements. The first element is some value, the second element is a character table for storing more branches of the trie. The value is initially NIL."
  (vector nil (make-char-table 'trie-table)))

(defun zencoding-trie-node-value (node)
  (aref node 0))

(defun zencoding-trie-node-set-value (node value)
  (aset node 0 value))

(defun zencoding-trie-node-branches (node)
  (aref node 1))

(defun zencoding-trie-node-branch (node branch-key)
  (aref (zencoding-trie-node-branches node) branch-key))

(defun zencoding-trie-node-create-branch (node branch-key)
  (aset (zencoding-trie-node-branches node)
    branch-key
    (make-zencoding-trie-node)))

(defun zencoding-trie-node-brancher (node branch-key)
  (let ((branch (aref (zencoding-trie-node-branches node) branch-key)))
    (if branch
    branch
      ;; branch doesn't exist, so create it and then return it.
      (zencoding-trie-node-create-branch node branch-key))))

(defun zencoding-trie-traverse (s trie i f)
  (if (null trie)
      nil
    (let ((branch (funcall f trie (aref s (1- i)))))
      (if (or (null branch) (= i (length s)))
      branch
    (zencoding-trie-traverse s branch (1+ i) f)))))

(defun zencoding-trie-retrieve (s trie)
  (zencoding-trie-traverse s trie 1 'zencoding-trie-node-branch))

(defun zencoding-trie-insert (s value trie)
  "`s' is the string used to navigate the trie (each character is a node in the trie). `value' is the value we want to insert. `trie' is the node where we start the insertion."
  (zencoding-trie-node-set-value
   (zencoding-trie-traverse s trie 1 'zencoding-trie-node-brancher)
   value))

(provide 'zencoding-trie)

;; test code
(let ((x (make-zencoding-trie-node)))
  (zencoding-trie-insert "ha" 1 x)
  (zencoding-trie-insert "hi" 2 x)
  (zencoding-trie-retrieve "ha" x))
Enter fullscreen mode Exit fullscreen mode

Top comments (1)

Collapse
 
rudolfolah profile image
Rudolf Olah

Cloudflare has a great article on using tries at scale: blog.cloudflare.com/pingora-saving...