loading...

Trie - Data Structure & Algorithm Part VI

fernandoblima profile image Fernando Baptistella ・6 min read

DataStructure & Algorithm (6 Part Series)

1) Linked List, Queue and Stack - Data Structure & Algorithm Part I 2) Dictionary  and HashTable -  Data Structure & Algorithms Part II 3 ... 4 3) Set and MultiSet -  Data Structure & Algorithm Part III 4) Disjoint Set -  Data Structure Part IV 5) Tree and Binary Search Tree - Data Structure & Algorithm Part V 6) Trie - Data Structure & Algorithm Part VI

In this post our main goal is to understand the Trie Data Structure, learning the concepts, how it works, and how to implement it (a.k.a code!).

It is important to understand the tree structure before diving into the trie. So, if you need to, you can read the last post about the tree and binary search tree.

Moving on, let’s discuss the data structure journey! 😁

💭 "Those hours of practice, and failure, are a necessary part of the learning process."- Gina Sipley

Outline

The article is divided into the following parts:

  • Understanding the Trie structure;
  • The main operations

◼️ Trie

Pre-requisite : Tree

We can say that the trie structure stores a set of strings that can be visualized like a tree where each node is a character. This structure is stored in a top to the bottom manner and the order that appears is based on the prefix of a string that all the descendants' nodes have in common.

But what do I mean about prefix? 🧐

Alt Text

Let's consider using the word 'Batman' for the set S of n strings to clarify our mind.

S1 = { B,a,t,m,a,n }

First all, the root of the structure is started with a node with value ε that represents the empty string. The next inserted node has the first value in the set S1 that is 'B'. Then, the next node to be used is the value 'a' and so on.

As we can see, each node can have several child values ​​(or not). At most, the size of the alphabet that the children are connected to, in our case can have up to 26 children.

So, let’s see an example using the word that we are discussing.

Alt Text

Figure 1: Inserted a new word

Great! Let's use this structure and add a new set that has the word 'Bat', using as the set S2 of n strings.

S2 = { B,a,t}

Here, the first letter 'B' of the set S2 has already been inserted in the first node. Therefore, we do not have to create another node, and the same happens to the letters 'a' and 't'. As a consequence, just need to mark the letter 't' as the end of a word.

See the next figure below that shows a trie with the words “Batman” and “Bat”.

Alt Text

Figure 2: Inserting a word that already have the as prefix in the structure

What's happens if we add the word 'Batgirl'?

S3 = { B,a,t,g,i,r,l}

As we discussed earlier, the structure already have the letters 'B', 'a', and 't'. So, just needs to create the node for other words. See below:

Alt Text

Figure 3: Inserting a word that already have a prefix

What if we add a word that starts with a different letter instead of 'B'? Don't worry, just need to insert a new node with a value. In this example, we will add the word 'Joker', in this way, the letter 'J' will be added after the node that represents the empty string. Keep in mind, don't forget to mark the last letter at the end of the word.

Alt Text

This happens with other words that can be added to our structure, such as Penguin, Ivy, Two-Face, and so on.

Alt Text

Figure 4: Inserting a word that start with a different first letter

After all, why should we use this structure? Why not use the tree structure? Well, the trie structure is faster compared to the tree and hash table because we do not need to calculate any hash functions or worry about dealing with collisions.

Awesome! Now that we understand the behaviors and how we can add values, let's build our structure. At first, we need to create our main class.

Talk is cheap. Let’s see the code. 😁

class TrieNode {
    constructor(isEnd, value ) {
        this.children = {};
        this.isEndOfWord = isEnd;
        this.character = value;
    }

}

class Trie {
    constructor() {
        this.root = new TrieNode(true, '*');
        this.length = 0;
    }

    ...

}

Each TrieNode represents a letter in the structure and has the following parameters:

  • children: As we discussed above, there can be more than one child.
  • isEndOfWord: Represents if the letter is the end of the word.
  • character: Is the node value.

Done! 😁

But not entirely! We need to create and add methods to our class. The implementation of insert, search and delete functions are a simpler way to implement these functions using Javascript, and all of these operations have the complexity of time O(L) where L is length of key.

Let's check out:

  • Insert

As mentioned earlier, this structure starts with a node that represents the empty string. We have to insert the first character of the set of strings, but if the value to be inserted has already been added, we just have to get down to the next level and continue to add the following values from the set.

However, if at some point there is no node, we will have to create and continue the process until the whole set is inserted, and of course, mark the last value of the set as the end of the word node. The space complexity of this structure in the worst case is when the word to be inserted is higher than the maximum number of nodes in the structure.

    insert(key){
        var currentValue = this.root;

        for (let index = 0; index < key.length; index++) {
            const element = key[index];
            if (currentValue.children[element]) {
                currentValue = currentValue.children[element];
            } else {
                this.length++;
                const newNode = new TrieNode(false, element);
                currentValue.children[element] = newNode;
                currentValue = newNode;
            }
        }
        currentValue.isEndOfWord = true;
    }
  • Search

Searching a string in this structure is a simple approach, we just have to iterate all characters of the set starting at the root and checking if the value matches and moving down to the next node. If the last letter used in the process is marked as the last node, then the set belongs to the searched word.

However, we can say that the set S is not present in the trie when:

  • There is no transition for children nodes and there is still a letter in the set.
  • If all the letters have been consumed and the last node in the process does not correspond to the string.
  • Or all characters exist in the structure, but the last letter is not marked as the end of the word node.
    searchWord(key){
        var currentValue = this.root;
        for (let index = 0; index < key.length; index++) {
            const element = key[index];
            if (currentValue.children[element]) {
                currentValue = currentValue.children[element];
            } else{
                return null;
            }
        }
        return currentValue;
    }
  • Suggestion Word

The main goal of this function is to show all words that have a prefix in common. At the beginning, is searched if the set of string already has been inserted in the structure and returns a list that contains all words that contain the word as a prefix.


    suggestionWord(key) {
        var word = this.searchWord(key);
        if(word){
            var suggestions = [];
            if(word.isEndOfWord){
                suggestions.push(key);
            }
            return this._suggestionWord(word, key, suggestions);
        }
        return [];
    }


    _suggestionWord(node, lastWord, suggestions){

        var letters = Object.keys(node.children); 
        for (let index = 0; index < letters.length; index++) {
            const element = letters[index];
            if(node.children[element].isEndOfWord){
                suggestions.push(lastWord + node.children[element].character);
                this._suggestionWord(node.children[element], lastWord + node.children[element].character, suggestions);
            }else{
                var rest = lastWord + node.children[element].character;
                this._suggestionWord(node.children[element], rest, suggestions);
            }
        }

        return suggestions;
    }

  • Remove

In this function, the word is removed from the structure if contains the prefix and does not have any other words that use as a prefix.

  remove(key) {
        if(this.search(key)){
            return this._removeNode(this.root ,key, key, 0);
        }else{
            return false;
        }
    }

    _removeNode(node, keySlice ,key, index) {
        var letter = key[index];
        var current = node.children[letter];
        if(current){
            keySlice = key.slice(index + 1, key.length);
            var shouldRemove = this._removeNode(current, keySlice, key, index + 1 );
            if(shouldRemove && !this.hasChild(node.children[letter].children)){
                this.length--;
                delete node.children[letter];
                key = keySlice;
                return true;
            }else{
                return false;
            }
        }
        return true;
    }


That's all folks! I hope you have fun learning. 😁

Alt Text

Code: https://github.com/FernandoBLima/data-structures


So we finished our discussion about Trie structure. 🙌

I hope you have a clear idea how to work. If you found this article helpful or if you find something I miss out or that you like it, feel free to let me know. 😁

DataStructure & Algorithm (6 Part Series)

1) Linked List, Queue and Stack - Data Structure & Algorithm Part I 2) Dictionary  and HashTable -  Data Structure & Algorithms Part II 3 ... 4 3) Set and MultiSet -  Data Structure & Algorithm Part III 4) Disjoint Set -  Data Structure Part IV 5) Tree and Binary Search Tree - Data Structure & Algorithm Part V 6) Trie - Data Structure & Algorithm Part VI

Posted on May 27 by:

fernandoblima profile

Fernando Baptistella

@fernandoblima

I’m a software development enthusiast and interested in machine learning, data mining, design patterns and new technologies.

Discussion

markdown guide
 

Nice example, and very nicely explained. I'd be keen to discuss this though... While for your example I can see it working well, finding everything with a common prefix is nice but it's not really faster than a hash for a single item lookup right?

This isn't an O(1) like a hash -> you have to do an O(1) for every character in the string on the lookup in the worst case (finding a complete match). Whereas a hash is doing an O(1) once. Sure to create the hash you have to touch a character in the string, but this is a mathematical operation not a table lookup. In your example you are using hashes at every character position anyway because that's how property lookups work. So I guess we could argue that both are O(N) where N is the length of the string. Locality of reference though, memory accesses etc, and for a single item lookup the hash is going to win by a significant margin.

In this you can see that there is also an effect of the number of items increasing the search time of the Trie (and not impacting the POJO hash lookup). Is this an implementation detail or something i'm not getting in the algorithm? I'm keen to get the best version of this for a project of mine. At present I'm finding an index in a sorted list using a binary search and gallop. Trie looks like it could be better than that, but not sure due to the number of hashed deferences at each letter - feels like it might be length of string dependent.

 

Thanks for your comment! After reading it, I believe I was misunderstood because I did not explain it correctly.

I’ll try to improve the implementation of this code and see the effect when the number of items is increasing, but, as I said, my code was just a simple example of trie structure. And it is necessary to consider that the native associative array/hash of the javascript will have better results because already have a nice well-optimized implementation. That said, the array will be more faster than most general purposes.

I agree with you when you said that the trie isn't an O(1) as a hash-table... But the trie can perform better in some complex problems, such as a search engine or autocomplete feature comparing with the hash table structure. Because we have to consider an imperfect associative array and it's almost impossible to create a uniform random distribution to avoid conflicts that lead to handling collisions by separate chaining or linear/double/Quadratic Proibing, the time to calculate the hash table, memory accesses...

I did't understand the problem in your project, but if you want, you can describe more and I can try to help you :)