DEV Community

Bruno Oliveira
Bruno Oliveira

Posted on

Serialize and Deserialize a Binary Tree

Following on my series of posts where I solve interview problems, let's look at this problem today:

This problem was asked by Google.
Given the root to a binary tree, implement serialize(root), which serializes the tree into a string, and deserialize(s), which deserializes the string back into the tree.
For example, given the following Node class

class Node:
    def __init__(self, val, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

The following test should pass:

node = Node('root', Node('left', Node('left.left')), Node('right'))
assert deserialize(serialize(node)).left.left.val == 'left.left'

Introducing the Node class

The approach we can take to solve this problem, is to extend the available methods in the Node class, to treat it like a full binary tree.
More specifically, a binary tree is "just a root node", where by following the children of the node left and right, we can get the full tree.

We will add a toString() method to the Node class which will serve as our serialize(root) method.

Understanding the toString() and serialize(root) methods

Let's look at the implementations for these two methods, which will enable us to serialize a full binary tree into its string representation:

@Override
    public String toString() {
        return "Node(" + this.value + "," + (this.getLeft() != null ? this.getLeft().toString() : "null") + "," + (
                this.getRight() != null ?
                        this.getRight().toString() :
                        "null") + ")";
    }

Our toString() method should work recursively on this class, starting from the root node of the tree and additionally it should follow the representation from the problem statement.

We do just that by prepending each node's string representation with "Node(" and adding the value of the node, comma, and then recursively do the same for the left child, adding a comma, and then the right child.

Our recursion stops when we reach nodes with no more children, on which case we simple serialize it by showing "null" for both its children to indicate that it's a terminal node.

Then, our serialize(root) method can simply delegate to the toString() of the root of the binary tree. In pseudo-code:

serialize(root):
  return root.toString()

Our complete Node class looks like:

public class Node {
    private String value;
    private Node left;
    private Node right;

    public Node(String value, Node left, Node right) {
        this.value = value;
        this.left = left;
        this.right = right;

    }

    public Node addLeft(Node n) {
        this.left = n;
        return this;
    }

    public Node addRight(Node n) {
        this.right = n;
        return this;
    }

    public Node getLeft() {
        return left;
    }

    public Node getRight() {
        return right;
    }

    public String getValue() {
        return value;
    }

    @Override
    public String toString() {
        return "Node(" + this.value + "," + (this.getLeft() != null ? this.getLeft().toString() : "null") + "," + (
            this.getRight() != null ?
                this.getRight().toString() :
                "null") + ")";
    }

Let's move on and look at how the deserialization will work. For the scope of this problem, deserialization will refer to the process of converting a given string representation into a binary tree.

How to approach the deserialization process

The deserialization of the tree is arguably the hardest part of the whole problem, and, the way I approached this was using a combination of string parsing and recursive construction of the tree. Let's see how.

Annotating the tree with meta-information

The idea is that, from a given string representation, we can extract useful information regarding the structure of the tree, and using that meta-information, we can parse the tree much easily from its string representation. The information we are interested in extracting is:

  • The value of the root node of the tree

  • The existence of either both or a single or even none sub-trees

If we know information about the root node and about its left and right subtrees, we basically solve the problem, since recursion can solve it for us all the way down to the leaves, where there are no children and the recursion can terminate safely. Let's see how to annotate the tree:

public List<Pair<String, Integer>> annotateSerializedTree(String stringTree) {
        List<Pair<String, Integer>> indexes = new ArrayList<>();
        int parCnt = -1;
        int commaCnt = -1;
        int idx = 0;
        for (Character c : stringTree.toCharArray()) {
            if (c == '(') {
                parCnt++;
                indexes.add(new Pair<>("p" + parCnt, idx));
            }
            if (c == ')') {
                parCnt--;
                indexes.add(new Pair<>("p" + parCnt, idx));
            }
            if (c == ',') {
                commaCnt = parCnt;
                indexes.add(new Pair<>("c" + commaCnt, idx));
            }
            idx++;
        }
        return indexes;
    }

Let's go over this method since it's quite complex.

The idea is to index the positions of several relevant characters in the string to identify the root and both the left and right subtrees.

Let's take this example:

Node(4,Node(2,null,null),Node(8,null,null))

Indexes list would contain:

[p0,c0,p1,c1,c1,p0,c0,p1,c1,c1,p0,p(-1)]

This way of indexing has some advantages versus solely enumerating the special characters we need:

Firstly, the index in front of each parenthesis or comma will be proportional to the maximum depth of the tree. This will allow us to know just how deep the tree is. Indirectly, this is very useful since it also means that "top level" commas and parenthesis in the string representation can give us information about the root and subtrees just like we needed.

Secondly, it's now very easy to extract this information from the tree thanks to the fact that we used a Pair class to store the indexes of each type and level in the string.

public class Pair<A, B> implements Comparable {
    private final A fst;
    private final B snd;

    public Pair(A fst, B snd) {
        this.fst = fst;
        this.snd = snd;
    }

    public A getFst() {
        return fst;
    }

    public B getSnd() {
        return snd;
    }

    @Override
    public String toString() {
        return fst.toString() + ", " + snd.toString();
    }

    @Override
    public int compareTo(Object o) {
        Integer other = (Integer)((Pair)o).getSnd();
        Integer thisObj = (Integer)((Pair)this).getSnd();
        if (other.equals(thisObj)) {
            return 0;
        }
        return (other > thisObj) ? -1 : 1;
    }
}

This is the pair class we will use, which needs to implement comparable since we will need to sort elements of its type.

Returning to the problem at hand, it's now very easy to extract information from the annotated tree, namely, it's easy to extract information regarding the root and the subtrees:

 public Node extractRootUsingAnnotations(String stringTree, List<Pair<String, Integer>> annotations) {
        int firstParIdx = annotations.stream()
            .filter(e -> e.getFst().equals("p0"))
            .findFirst()
            .map(Pair::getSnd)
            .orElse(-1);
        int firstCommaIdx = annotations.stream()
            .filter(e -> e.getFst().equals("c0"))
            .findFirst()
            .map(Pair::getSnd)
            .orElse(-1);
        String rootValue = stringTree.substring(firstParIdx + 1, firstCommaIdx);
        return new Node(rootValue, null, null);
    }

So the root of our deserialized tree can be fully rebuilt by finding the value of the root of the original tree using the annonations we created earlier and building a new root node with that value, and, for now, no children, since we simply extract the root node in a first step.

Extracting the children - with a twist

Now that we have the root of the tree, we need to extract the rest of the tree, but, we won't do it on a node basis, instead, we will simply focus on the top level, i.e., we first extract the left subtree and then the right subtree and we can delegate the work of rebuilding the tree completely to the deserialize method by "believing in the recursion".

The idea is that if we extract the left and right subtrees completely, they are now completely identical to two completely new and independent binary trees. Namely, each subtree has its own "root", so we can do this by focusing on appending "root" nodes as we cascade down the tree until we have successfully rebuilt it!

Taking this into account, the next method simply checks for the existence of children of the root node and it returns them if they exist as a list of two strings:

 public List<String> checkIfChildrenExist(String stringTree, List<Pair<String, Integer>> annotations) {
        int firstCommaIdx = annotations.stream()
            .filter(e -> e.getFst().equals("c0"))
            .findFirst()
            .map(Pair::getSnd)
            .orElse(-1);
        int lastCommaIdx = annotations.stream()
            .sorted(Collections.reverseOrder())
            .filter(e -> e.getFst().equals("c0"))
            .findFirst()
            .map(Pair::getSnd)
            .orElse(-1);

        String leftSubtree = stringTree.substring(firstCommaIdx + 1, lastCommaIdx);
        String rightSubtree = stringTree.substring(lastCommaIdx + 1, stringTree.length() - 1);
        return Arrays.asList(leftSubtree, rightSubtree);
    }

So now, we can put the full deserialization process together quite nicely and as a bonus, with self-descriptive code:

public Node deserialize(String stringTree) {
        List<Pair<String, Integer>> annotations = annotateSerializedTree(stringTree);
        Node root = extractRootUsingAnnotations(stringTree, annotations);
        List<String> children = checkIfChildrenExist(stringTree,
            annotations);
        //left and/or right subtrees of root - null if absent
        if (!children.get(0).equals("null")) {
            root.addLeft(deserialize(children.get(0)));
        }
        if (!children.get(1).equals("null")) {
            root.addRight(deserialize(children.get(1)));
        }
        return root;
    }

As we can see, the most important step in this method is the construction of the tree via recursion from the root:

  • The recursion ends when there are no more children on the current node (if it is "null", then we know to end the recursion).

  • The tree itself gets built by adding the left and right children to the root node we just extracted via annotations, and then by recursing on the current node's children with:

root.addLeft(deserialize(children.get(0))); and root.addRight(deserialize(children.get(1)));

we "gain" the original tree structure for free thanks to the recursion.

Adding some unit tests

Last but not least, it is very important to test our code using unit tests, because they provide a solid base of understanding to future readers of the code and they prevent regressions to creep into the code base if the underlying tree structure changes. Here is the unit test suite I added:

import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.assertEquals;

class TreeSerializerTest {

    @Test
    void serialize() {
        TreeSerializer serializer = new TreeSerializer(new Node("root", null, null));
        serializer.serialize(serializer.getTree());
        assertEquals("Node(root,null,null)", serializer.serialize(serializer.getTree()));
    }

    @Test
    void serializeComplex() {
        TreeSerializer serializer = new TreeSerializer(
            new Node("root", new Node("left", null, new Node("leftChild", null, null)), null));
        serializer.serialize(serializer.getTree());
        assertEquals("Node(root,Node(left,null,Node(leftChild,null,null)),null)",
            serializer.serialize(serializer.getTree()));
    }

    @Test
    void deserializeAndCheckRoot() {
        TreeSerializer serializer = new TreeSerializer(
            new Node("root", new Node("left", null, new Node("leftChild", null, null)), null));
        Node treeRoot = serializer.deserialize("Node(root,Node(left,null,Node(leftChild,null,null)),null)");
        assertEquals("root", treeRoot.getValue());
    }

    @Test
    void deserializeAndCheckDeepInHierarchy() {
        TreeSerializer serializer = new TreeSerializer(
            new Node("root", new Node("left", null, new Node("leftChild", null, null)), null));
        Node treeRoot = serializer.deserialize("Node(root,Node(left,null,Node(leftChild,null,null)),null)");
        assertEquals("leftChild", treeRoot.getLeft().getRight().getValue());
    }
}

It's simply worthwhile to note that to have meaningful tests, it's necessary to test both serialization and deserialization.

Conclusions

I hoped you liked reading and hope you managed to learn something useful from the way I approached this problem. Different approaches and comments are welcome.

Top comments (0)