loading...

Java Daily Coding Problem #003

awwsmm profile image Andrew (he/him) ・14 min read

Daily Coding Problem is a website which will send you a programming challenge to your inbox every day. I want to show beginners how to solve some of these problems using Java, so this will be an ongoing series of my solutions. Feel free to pick them apart in the comments!

Problem

Given the root to a binary tree, implement serialize(root), which serializes the tree into a string, and deserialize(s), which deserializes the string back into the tree.

For example, given the following Node class

class Node:
    def __init__(self, val, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

The following test should pass:

node = Node('root', Node('left', Node('left.left')), Node('right'))
assert deserialize(serialize(node)).left.left.val == 'left.left'

Strategy

Spoilers! Don't look below unless you want to see my solution!


Okay, so this clearly wasn't written with Java in mind, as the example code and test are given in Python. The first thing we should do is rewrite the code above in Java.

__init()__ in Python acts like -- but is not a direct analog of -- an object constructor in Java. So we should define a Java Node class with a constructor that takes a val (value) and two Nodes as children.

In the Python example, the child nodes default to None, which is again sort-of, kind-of, not really like a null value in Java. So we should allow left and/or right to be null.

Python passes an explicit self to any method defined for an object, unlike most other programming languages. This is why there's a self in the __init__() method (the first of four arguments), but there are only three arguments passed to create node in the test. In Java, this is passed implicitly to the methods of any object, so we can drop it in the Java version.

The crux of this problem is asking for a String Node::toString() method on our Java Node class which converts the Node to a serialized (String) representation of the object. That representation should then be able to be converted directly into a Node with a Node Node::fromString() method or something similar. These two methods are the serialization and de-serialization methods, respectively.

Code

Building the Node class

Let's start by building this bare-bones Node class as defined by the prompt:

public class Node {

  private int val;
  private Node left;
  private Node right;

  public Node (int val, Node left, Node right) {

    this.val = val;
    this.left = left;
    this.right = right;

  }

}

This can be instantiated very easily in the jshell:

jshell> /open Node.java

jshell> Node n = new Node(3, null, null)
n ==> Node@f6c48ac

This is -- I think -- the simplest possible implementation of this class which satisfies the requirements of the prompt. Note that val must have a type of some kind, so I made it an int above. We can make the node generic with:

public class Node<T> {

  private T val;
  private Node<T> left;
  private Node<T> right;

  public Node (T val, Node<T> left, Node<T> right) {

    this.val = val;
    this.left = left;
    this.right = right;

  }

}

Though now we have to explicitly state the type of data held within the Node:

jshell> /open Node.java

jshell> Node<Integer> n = new Node<>(3, null, null)
n ==> Node@14bf9759

Since every Java class is a subclass of Object, we can declare all Nodes as Node<Object>s if this is too much of a burden. (But if we're going to do that, we might as well just make the type of val Object and forgo the generics.)

Anyway, back to the task at hand. Let's create a serialization method which we'll call toString() (the Java standard). By default, toString() called on an object returns the type of that object and its hash:

jshell> n.toString()
$13 ==> "Node@5f341870"

We want our serialization method to include a full description of the Node so it can be reconstructed from the returned String, if necessary. Here we get into a bit of trouble.

Unless we restrict ourselves to some subset of objects (say numbers, characters, and strings) and write some code which can easily parse these objects from their String representations, it's going to be very difficult to serialize our Nodes in this manner.

Although it's not explicit in the prompt, only strings are used as data in the test. I think it's fair if we restrict ourselves to using only String vals. In that case, we can again rewrite our Node:

public class Node {

  private String val;
  private Node left;
  private Node right;

  public Node (String val, Node left, Node right) {

    this.val = val;
    this.left = left;
    this.right = right;

  }

}

In the example, we can easily access the left (or right) children of a Node by calling left() or right() methods. We can also get the val with a val() method. Let's add those:

public class Node {

  private String val;
  private Node left;
  private Node right;

  public Node (String val, Node left, Node right) {

    this.val = val;
    this.left = left;
    this.right = right;

  }

  public Node left() { return this.left; }

  public Node right() { return this.right; }

  public String val() { return this.val; }

}

This works as expected in the jshell...

jshell> Node n = new Node("3", null, new Node("right", null, null))
n ==> Node@10bdf5e5

jshell> n.left()
$19 ==> null

jshell> n.right()
$20 ==> Node@617faa95

jshell> n.val()
$21 ==> "3"

...but we still need those serialization / deserialization methods. One last thing to do before we add them: although Python is more flexible than Java in that you can provide the arguments to a method in any order using named parameters, in the test above, one of the nodes is created with only a single (left) child by ommitting the right child. Another is created with no children nodes at all. Let's add some alternative constructors with one and zero child Nodes to emulate this behaviour:

public class Node {

  private String val;
  private Node left;
  private Node right;

  public Node (String val, Node left, Node right) {
    this.val = val;
    this.left = left;
    this.right = right;
  }

  public Node (String val, Node left) {
    this.val = val;
    this.left = left;
    this.right = null;
  }

  public Node (String val) {
    this.val = val;
    this.left = null;
    this.right = null;
  }

  public Node left() { return this.left; }

  public Node right() { return this.right; }

  public String val() { return this.val; }

}

To avoid repeating ourselves, we could also do:

public class Node {

  private String val;
  private Node left;
  private Node right;

  public Node (String val, Node left, Node right) {
    this.val = val;
    this.left = left;
    this.right = right;
  }

  public Node (String val, Node left) {
    this(val, left, null);
  }

  public Node (String val) {
    this(val, null, null);
  }

  public Node left() { return this.left; }

  public Node right() { return this.right; }

  public String val() { return this.val; }

}

Now we can create the same node as in the example, using almost the exact same syntax:

jshell> /open Node.java

jshell> Node node = new Node("root", new Node("left", new Node("left.left")), new Node("right"))
node ==> Node@4e1d422d

Serialization

Finally, we can create our serialization and de-serialization methods. Let's start with serialization. We need to write a method which encodes within it all of the information about a given Node, its val, and all of its children and their vals. This sounds quite complex.

In the simplest case, we have a Node with no children (where left and right are both null). Let's tackle that first. We could do something like:

  public String toString() {

    StringBuilder sb = new StringBuilder("Node(");

    if (val == null) {
      sb.append("null");

    } else {
      sb.append("\"");
      sb.append(val);
      sb.append("\"");
    }

    if (this.left == null) {
      sb.append(", null");
    }

    if (this.right == null) {
      sb.append(", null");
    }

    sb.append(")");

    return sb.toString();

  }

Note that I use StringBuilder rather than String, as it's more performant when we're concatenating lots of small Strings together.

When an object is created in jshell, it uses the object's toString() method to print it to the terminal, so let's see how our method works for a childless Node:

jshell> /open Node.java

jshell> Node node = new Node("test", null, null);
node ==> Node("test", null, null)

Looking good! Now, when a Node has children, we need to perform these steps recursively. In place of one of the nulls above, there should be another Node(...) printed. This is an easy fix -- if left or right is non-null, we just call left.toString() or right.toString() within the toString() method:

  public String toString() {

    StringBuilder sb = new StringBuilder("Node(");

    if (val == null) {
      sb.append("null");

    } else {
      sb.append("\"");
      sb.append(val);
      sb.append("\"");
    }

    if (this.left == null) {
      sb.append(", null");

    } else {
      sb.append(", ");
      sb.append(this.left.toString());
    }

    if (this.right == null) {
      sb.append(", null");

    } else {
      sb.append(", ");
      sb.append(this.right.toString());
    }

    sb.append(")");

    return sb.toString();

  }

}

How does this work?

jshell> /open Node.java

jshell> Node node = new Node("test", null, new Node("right", null, null));
node ==> Node("test", null, Node("right", null, null))

jshell> Node node = new Node("root", new Node("left", new Node("left.left")), new Node("right"))
node ==> Node("root", Node("left", Node("left.left", null, ... Node("right", null, null))

...just fine! So we've successfully implemented our serialization method, toString(). What about deserialization? This is a bit more complicated.

To deserialize, there are a few things we need to make note of:

  • all vals are either null or Strings surrounded by quotes
  • all Nodes are either null or begin with Node(
  • Nodes can have 0, 1, or 2 children

This is complex. The serialization method was designed to make the output human-readable and not much else. Let's see if we can tweak it to make it more easily parseable by the de-serialization method:

  public String toString() {

    StringBuilder sb = new StringBuilder("Node: ");

    if (val == null) {
      sb.append("null");

    } else {
      sb.append("\"");
      sb.append(val);
      sb.append("\"");
    }

    if (this.left == null) {
      sb.append("\n  null");

//    } else {
//      sb.append(", ");
//      sb.append(this.left.toString());
    }

    if (this.right == null) {
      sb.append("\n  null");

//    } else {
//      sb.append(", ");
//      sb.append(this.right.toString());
    }

    sb.append("\n");

    return sb.toString();

  }

Now, the output looks like this:

jshell> /open Node.java

jshell> Node node = new Node("test");
node ==> Node: "test"
  null
  null


jshell> System.out.print(node.toString())
Node: "test"
  null
  null

In this revised version, the val is printed after a Node: and the left and right children are printed beneath the val. What does it look like when the Node has some children? Does a straightforward approach work?

  public String toString() {

    StringBuilder sb = new StringBuilder("Node: ");

    if (val == null) {
      sb.append("null");

    } else {
      sb.append("\"");
      sb.append(val);
      sb.append("\"");
    }

    if (this.left == null) {
      sb.append("\n  null");

    } else {
      sb.append("\n  ");
      sb.append(this.left.toString());
    }

    if (this.right == null) {
      sb.append("\n  null");

    } else {
      sb.append("\n  ");
      sb.append(this.right.toString());
    }

    sb.append("\n");

    return sb.toString();

  }
jshell> /open Node.java

jshell> Node node = new Node("test", null, new Node("right"));
node ==> Node: "test"
  null
  Node: "right"
  null
  null



jshell> System.out.print(node.toString())
Node: "test"
  null
  Node: "right"
  null
  null


...ooh, not quite. We've got too many blank lines at the end, and right's children aren't indented as expected. We can create a second toString() method that takes a parameter indent. This can be the number of levels the child Node should be indented in the output.

After a bit of tweaking, I end up with something like this:

  public String toString() {
    return toString(0);
  }

  public String toString (int indent) {

    String spacer = "  ";
    String bump = String.join("", Collections.nCopies(indent, spacer));

    StringBuilder sb = new StringBuilder(bump);
    sb.append("Node: ");

    bump = bump + spacer;

    if (val == null) {
      sb.append("null");

    } else {
      sb.append("\"");
      sb.append(val);
      sb.append("\"");
    }

    if (this.left == null) {
      sb.append("\n");
      sb.append(bump);
      sb.append("null");

    } else {
      sb.append("\n");
      sb.append(this.left.toString(indent+1));
    }

    if (this.right == null) {
      sb.append("\n");
      sb.append(bump);
      sb.append("null");

    } else {
      sb.append("\n");
      sb.append(this.right.toString(indent+1));
    }

    return sb.toString();

  }

...which looks something like this:

jshell> Node node = new Node("test", new Node("left"), new Node(null));
node ==> Node: "test"
  Node: "left"
    null
    null
  Node: null
    null
    null

jshell> System.out.print(node.toString())
Node: "test"
  Node: "left"
    null
    null
  Node: null
    null
    null

This is a slightly different serialization method which will hopefully make deserialization a bit easier. Now, to deserialize, we work our way from left to right. The left-most Node will always be the root Node. If we move two spaces to the right and look up and down that column, we'll find (on the top) the left child Node, which is either null, or a Node itself. On the bottom, we find the right child Node. The val held by a node always comes after the Node: marker and is always a String surrounded by quotes, unless it is null.

Deserialization

To deserialize the above output, we need to parse the nodetree from left to right. The val held by a Node appears after the Node: marker and its children (left and right) are found beneath it. Let's again begin with the simplest example, a Node with no children:

jshell> Node node = new Node("test");
node ==> Node: "test"
  null
  null

jshell> System.out.print(node.toString())
Node: "test"
  null
  null

To start, we find the val by simply looking at everything after "Node:". If the resulting substring is surrounded by quotes, we remove them. Otherwise, the substring must be null:

  public static Node fromString (String serialized) {

    String marker = "Node: ";
    int valStart = serialized.indexOf(marker) + marker.length();
    int valEnd   = serialized.indexOf("\n");

    String val = serialized.substring(valStart, valEnd);

    if (val.charAt(0) == '"')
      val = val.substring(1, val.length()-1);

    else
      val = null;

    System.out.println("val:");
    System.out.println(val);

    return null;

  }

This will print:

jshell> /open Node.java

jshell> Node node = new Node("test", new Node("left"), new Node(null));
node ==> Node: "test"
  Node: "left"
    null
    null
  Node: null
    null
    null

jshell> node.toString()
$84 ==> "Node: \"test\"\n  Node: \"left\"\n    null\n    null\n  Node: null\n    null\n    null"

jshell> Node.fromString(node.toString())
val:
test
$85 ==> null

So val was parsed correctly! Next is just parsing the children. If they're null, we add a null child. If they're not, we recursively call the fromString() method on them! But they could be pretty far down the tree (left might have 17 generations of children and grandchildren, for instance). So how do we find right?

We know that it's indented exactly two spaces in our serialized representation, so we can split the String into lines and find the only two lines which should be indented exactly two spaces before a null or a Node: appears.

Or, since all lines after the first begin with a \n followed by some number of spaces, we can replace all instances of \n plus two spaces with just \n, then remove the first line, then look for the two lines which don't begin with any spaces.

  public static Node fromString (String serialized) {

    String marker = "Node: ";
    int valStart = serialized.indexOf(marker) + marker.length();
    int valEnd   = serialized.indexOf("\n");

    String val = serialized.substring(valStart, valEnd);

    if (val.charAt(0) == '"')
      val = val.substring(1, val.length()-1);
    else val = null;

    String modified = serialized.replaceAll("\n  ", "\n");
    modified = modified.substring(valEnd+1);

    System.out.println(modified);

    return null;
  }

The result of the above is "de-denting" the entire serialized output two spaces:

jshell> Node node = new Node("test", new Node("left"), new Node(null));
node ==> Node: "test"
  Node: "left"
    null
    null
  Node: null
    null
    null

jshell> System.out.print(node.toString())
Node: "test"
  Node: "left"
    null
    null
  Node: null
    null
    null
jshell> Node.fromString(node.toString())
Node: "left"
  null
  null
Node: null
  null
  null
$102 ==> null

Now, from the first line down, we have the left Node, and from the first non-indented line after the first, we have the right Node. We need to handle null nodes, as the code we've written now would throw an error for them:

  public static Node fromString (String serialized) {

    String marker = "Node: ";
    int valStart = serialized.indexOf(marker);

    if (valStart < 0) return null; 

    valStart += marker.length();
    int valEnd = serialized.indexOf("\n");

    String val = serialized.substring(valStart, valEnd);

    if (val.charAt(0) == '"')
      val = val.substring(1, val.length()-1);
    else val = null;

    String modified = serialized.replaceAll("\n  ", "\n");
    modified = modified.substring(valEnd+1);

    System.out.println(modified);

    return null;
  }

...and then we need to find the two non-indented lines so we can run this process again on the left and right child Nodes. A non-indented line will be either null or a Node, so it will be the only instance where the \n line break character is immediately followed by an n or an N:

jshell> /open Node.java

jshell> Node node = new Node("test", new Node("left"), new Node(null));
node ==> Node: "test"
  Node: "left"
    null
    null
  Node: null
    null
    null

jshell> System.out.print(node.toString())
Node: "test"
  Node: "left"
    null
    null
  Node: null
    null
    null
jshell> Node.fromString(node.toString())
Node: "left"
  null
  null
Node: null
  null
  null
27
$110 ==> null

...so right starts at character index 27. Finally, we can split the children into right and left nodes and run this process on them again, recursively (comments added, as well):

  public static Node fromString (String serialized) {

    // is this a non-null Node?
    String marker = "Node: ";
    int valStart = serialized.indexOf(marker);

    // if null, return a null Node
    if (valStart < 0) return null;

    // otherwise, get the `val` of the Node
    valStart += marker.length();
    int valEnd = serialized.indexOf("\n");
    String val = serialized.substring(valStart, valEnd);

    // is `val` null?
    if (val.charAt(0) == '"')
      val = val.substring(1, val.length()-1);
    else val = null;

    // de-dent the serialized representation and look for children
    String modified = serialized.replaceAll("\n  ", "\n");
    modified = modified.substring(valEnd+1);

    // at what character does the `right` child start?
    int rightStart = Math.max(
        modified.indexOf("\nN"),
        modified.indexOf("\nn")
      ) + 1;

    // child node `left`
    Node left = null;

    // if `left` is not `null`
    if (modified.substring(0, 4) != "null") 
      left = fromString(modified.substring(0, rightStart));

    // child node `right`
    Node right = null;

    // if `right` is not `null`
    if (modified.substring(rightStart, rightStart+4) != "null")
      right = fromString(modified.substring(rightStart));

    return new Node(val, left, right);
  }

And here it is running in the jshell:

jshell> /open Node.java

jshell> Node node = new Node("test", new Node("left"), new Node(null));
node ==> Node: "test"
  Node: "left"
    null
    null
  Node: null
    null
    null

jshell> System.out.print(node.toString())
Node: "test"
  Node: "left"
    null
    null
  Node: null
    null
    null
jshell> Node copy = Node.fromString(node.toString())
copy ==> Node: "test"
  Node: "left"
    null
    null
  Node: null
    null
    null

jshell> System.out.print(copy.toString())
Node: "test"
  Node: "left"
    null
    null
  Node: null
    null
    null

Let's check the test example given in the prompt:

jshell> Node node = new Node("root", new Node("left", new Node("left.left")), new Node("right"))
node ==> Node: "root"
  Node: "left"
    Node: "left.left" ...  "right"
    null
    null

jshell> Node.fromString(node.toString()).left().left().val()
$129 ==> "left.left"

jshell> Node.fromString(node.toString()).left().left().val().equals("left.left")
$130 ==> true

It works as expected!

Discussion

This took much longer than I thought it would initially. My first attempt at serialization led to -- what I thought would be -- a very complex deserialization. So I refactored to a less elegant-looking serialization which had a much easier deserialization.

Both of my serialization and deserialization methods rely on recursion to get the job done, which I think is the best way to go about this. (You don't know ahead of time how deep the nodetree is going to be.)

The deserialization method might not be optimal, as its written in such a way that lots of data needs to be held on the stack. It's not tail call optimizable as written. I think, ideally, we would want to find the most deeply-nested Nodes first, create them, and move up the tree, rather than moving from the top-down. It's not immediately obvious to me how we would go about doing that.

I've written nodetrees in the past with much nicer-looking toString() implementations, though they're not serialized, as all of the data is not contained within the String representation. This is because these nodetrees often allow data which is not strictly String in nature.

One last small thing to note is -- because I'm using \n to separate the nodes, if \n appears in a val anywhere, there could be problems. Input should be sanitised or a special exception should be added so that a \n within a val doesn't break the deserialization.


All the code for my Daily Coding Problems solutions is available at github.com/awwsmm/daily.

Suggestions? Let me know in the comments.

Posted on by:

awwsmm profile

Andrew (he/him)

@awwsmm

Got a Ph.D. looking for dark matter, but not finding any. Now I code full-time. Je parle un peu français. dogs > cats

Discussion

pic
Editor guide