## DEV Community

Andrew (he/him)

Posted on • Updated on

# Custom (De)serialization with Scala (Scala DCP #003)

Daily Coding Problem is a website which will send you a programming challenge to your inbox every day. I want to show beginners how to solve some of these problems using Scala, so this will be an ongoing series of my solutions. Feel free to pick them apart in the comments!

### Problem

Given the root to a binary tree, implement `serialize(root)`, which serializes the tree into a string, and `deserialize(s)`, which deserializes the string back into the tree.

For example, given the following Node class

``````class Node:
def __init__(self, val, left=None, right=None):
self.val = val
self.left = left
self.right = right
``````

The following test should pass:

``````node = Node('root', Node('left', Node('left.left')), Node('right'))
assert deserialize(serialize(node)).left.left.val == 'left.left'
``````

### Strategy

See my discussion of this problem (and my Java implementation) here!

### Code

I feel like I'm cheating a bit for these first few Daily Coding Problems, because I've solved them previously using Java. Maybe we can improve on the Java solution with Scala, though. Here's the `Node` class I defined in Java, without the `serialize` and `deserialize` methods:

``````public class Node {

private String val;
private Node left;
private Node right;

public Node (String val, Node left, Node right) {
this.val = val;
this.left = left;
this.right = right;
}

public Node (String val, Node left) {
this(val, left, null);
}

public Node (String val) {
this(val, null, null);
}

public Node left() { return this.left; }

public Node right() { return this.right; }

public String val() { return this.val; }

}
``````

Believe it or not, all of this can be replaced with a single line of Scala code:

``````case class Node (value: String, left: Node = null, right: Node = null)
``````

A `case class` in Scala is shorthand for a class which contains only immutable member variables. In the example above, `value`, `left`, and `right` will be assignable only when a `Node` object is created, and cannot be changed once that object is instantiated. `case class`es also define a companion object of the same name which can be used for (among other things) making object instantiation a bit less verbose.

The above Scala code also uses default arguments to define `left` and `right` as `null` when they're not specified by the user. So a `Node` object can be created with zero, one, or two child `Node`s, without needing to define additional constructors.

Finally, `case class`es provide getters (but not setters, because of the aforementioned immutability) for all of the member variables defined in the argument list. They will have the same names as the arguments themselves.

So that single line of Scala code above allows us to do all of the following:

``````scala> val nA = Node("nA")
nA: Node = Node(nA,null,null)

scala> val nB = Node("nB", nA)
nB: Node = Node(nB,Node(nA,null,null),null)

scala> val nC = Node("nC", nB, nA)
nC: Node = Node(nC,Node(nB,Node(nA,null,null),null),Node(nA,null,null))

scala> nA.value
res4: String = nA

scala> nB.left
res5: Node = Node(nA,null,null)

scala> nC.right
res6: Node = Node(nA,null,null)
``````

If all of the above benefits of `case class`es weren't enough, they also define a default `toString` method which prints the internal data in a nice, human-readable format, as seen above. In the Java example, we needed to write that code, as well. To get the equivalent functionality of this single line of Scala code, we needed roughly fifty lines of Java. That's quite the reduction in complexity.

Next, I'd like to take a few artistic liberties. The problem asks us to define `serialize` and `deserialize` methods using (presumably) Python. Serializing an object really just means defining a `toString` method. In Python, the default `toString` equivalent is about as ugly as the default Java `toString`:

``````Python 2.7.16 (default, Dec 13 2019, 18:00:32)
[GCC 4.2.1 Compatible Apple LLVM 11.0.0 (clang-1100.0.32.4) (-macos10.15-objc-s on darwin
>>> class Node:
...     def __init__(self, val, left=None, right=None):
...         self.val = val
...         self.left = left
...         self.right = right
...
>>> node = Node('root', Node('left', Node('left.left')), Node('right'))
>>> print(node)
<__main__.Node instance at 0x10d312ab8>
``````

When we `print` the `Node`, it is serialized into a string representation using an automatically defined method. In Python, this method is called `__str__()`. Although `__str__()` can be redefined directly, the problem asks us to define a `serialize` method instead. Rather than reinventing the wheel, I would prefer to just use our default `toString` provided by our `Node` `case class` in Scala. The spirit of the problem is simply to be able to convert these objects to their string representations, then back into objects.

However, it's at this point that I'd like to draw your attention to something which may end up bring a problem during our deserialization process. Notice how our Scala `Node`s are serialized:

``````scala> Node("heyo")
res16: Node = Node(heyo,null,null)
``````

The `Node` itself is denoted by the literal string `Node(`, followed by comma-separated arguments, then closed with a `)`. A straightforward deserialization might involve using these three things as sentinels:

1. the string `Node(` begins the definition of a `Node`
2. commas `,` separate the `value` from the `left` and `right` `Node`s, and
3. the character `)` ends the definition of the `Node`

You might be able to see how this could be abused. First, imagine someone simply doesn't want their `Node` to have a `value`, they might define:

``````scala> Node("null",null,null)
res19: Node = Node(null,null,null)

scala> Node(null,null,null)
res20: Node = Node(null,null,null)
``````

Note how both of these `Node`s have the same string representation, even though they are really very different things. We can therefore `require` that the `value` of a `Node` is not equal to the string `"null"` by modifying the `case class` definition:

``````case class Node (value: String, left: Node = null, right: Node = null) {
require(value != "null") }
``````

Now the compiler will throw an `Exception` when our `value` is a string masquerading as a `null`:

``````scala> Node("null",null,null)
java.lang.IllegalArgumentException: requirement failed
at scala.Predef\$.require(Predef.scala:268)
... 29 elided
``````

Alternatively, we could `override` the default `toString` method to have it surround the `value` with quotes. Something like...

``````scala> Node("null",null,null)
res42: Node = Node("null",null,null)

scala> Node(null,null,null)
res43: Node = Node(null,null,null)
``````

This, I think, is a better solution. In addition to solving this `null` issue, it also allows users to put commas `,` within labels, which otherwise could result in confusing serialized outputs like:

``````scala> Node("null,null,null",null,null)
res42: Node = Node(null,null,null,null,null)
``````

...this would be a bit more difficult to parse. If you don't think that's too bad, imagine a more pathological example:

``````scala> Node("null,Node(null,Node(null,null,null),null)",null,null)
res42: Node = Node(null,Node(null,Node(null,null,null),null),null,null)
``````

So maybe it is best that we redefine the `toString` method, at least to surround the `value` in quotes. Below, we `override` the default `toString` method, and make use of triple quotes to avoid having to escape the literal `"` characters within the string:

``````scala> case class Node (value: String, left: Node = null, right: Node = null) {
|   override def toString = s"""Node("\$value",\$left,\$right)""" }
defined class Node

scala> Node("null,Node(null,Node(null,null,null),null)",null,null)
res27: Node = Node("null,Node(null,Node(null,null,null),null)",null,null)
``````

Remember that we're defining `toString` and `fromString` rather than `serialize` and `deserialize`. This is to avoid having the methods `toString` and `serialize` do nearly (but not exactly) the same thing.

...now we can clearly tell where the `value` begins and ends, by looking for the `"` characters (or the syntax highlighting for strings). We can then sanitize user inputs by replacing any `"` characters which appear within the `value` with their escaped equivalents:

``````scala> case class Node (value: String, left: Node = null, right: Node = null) {
|   override def toString = s"""Node("\${value.replaceAll("\"", "\\\\\"")}",\$left,\$right)""" }
defined class Node

scala> Node("""Dwayne "The Rock" Johnson""",null,null)
res28: Node = Node("Dwayne \"The Rock\" Johnson",null,null)
``````

Beautiful! Finally, we can parse the serialized output using this clever recursive pattern matching technique. The gist of this method is that we define a regular expression which matches `terminal` `Node`s (leaves) and another regular expression which matches `arbitrary` `Node`s. We try to match on the `terminal` regex and if that doesn't work, we try to match on the `arbitrary` one.

For our serialization technique, a `terminal` `Node` should match the pattern:

``````scala> val terminal = """Node\("((?:[^\"]|\")*)",null,null\)""".r
terminal: scala.util.matching.Regex = Node\("((?:[^\"]|\")*)",null,null\)

scala> val rock = Node("""Dwayne "The Rock" Johnson""",null,null)
rock: Node = Node("Dwayne \"The Rock\" Johnson",null,null)

scala> rock.toString match { case terminal(value) => println(s"\$value") }
Dwayne \"The Rock\" Johnson

scala> val list = Node("My list is List(1,2,3)",null,null)
list: Node = Node("My list is List(1,2,3)",null,null)

scala> list.toString match { case terminal(value) => println(s"\$value") }
My list is List(1,2,3)
``````

The `.r` at the end of the string indicates that it's a regular expression.

...but any arbitrary `Node` should match the pattern:

``````scala> val arbitrary = """Node\("((?:[^\"]|\\")*)",(null|Node\(.*\)),(null|Node\(.*\))\)""".r
arbitrary: scala.util.matching.Regex = Node\("((?:[^\"]|\\")*)",(null|Node\(.*\)),(null|Node\(.*\))\)

scala> val nodes = Node("null",rock,list)
nodes: Node = Node("null",Node("Dwayne \"The Rock\" Johnson",null,null),Node("My list is List(1,2,3)",null,null))

scala> nodes.toString match { case arbitrary(value, left, right) => println(s"\$value -- \$left -- \$right") }
null -- Node("Dwayne \"The Rock\" Johnson",null,null) -- Node("My list is List(1,2,3)",null,null)

scala> rock.toString match { case arbitrary(value, left, right) => println(s"\$value -- \$left -- \$right") }
Dwayne \"The Rock\" Johnson -- null -- null

scala> list.toString match { case arbitrary(value, left, right) => println(s"\$value -- \$left -- \$right") }
My list is List(1,2,3) -- null -- null
``````

The only remaining issue we have is when `value` is `null`, in which case `toString` throws a `NullPointerException`:

``````scala> val noll = Node(null,null,null)
java.lang.NullPointerException
at Node.toString(<console>:14)
at scala.runtime.ScalaRunTime\$.inner\$1(ScalaRunTime.scala:254)
at scala.runtime.ScalaRunTime\$.stringOf(ScalaRunTime.scala:259)
at scala.runtime.ScalaRunTime\$.replStringOf(ScalaRunTime.scala:267)
at .\$print\$lzycompute(<console>:9)
at .\$print(<console>:6)
at \$print(<console>)
...
``````

...we can avoid this by disallowing `null` `value`s in our constructor:

``````scala> case class Node (value: String, left: Node = null, right: Node = null) {
|   require (value != null)
|   override def toString = s"""Node("\${value.replaceAll("\"", "\\\\\"")}",\$left,\$right)"""
| }
defined class Node

scala> val noll = Node(null,null,null)
java.lang.IllegalArgumentException: requirement failed
at scala.Predef\$.require(Predef.scala:268)
... 29 elided
``````

Note that empty `value` strings are still allowed, though

``````scala> val noll = Node("",null,null)
noll: Node = Node("",null,null)
``````

Finally, using this SO answer as a template, we can deserialize any arbitrary `Node`:

``````scala> def fromString(serialized: String): Node = {
|   val arbitrary = """Node\("((?:[^\"]|\\")*)",(null|Node\(.*\)),(null|Node\(.*\))\)""".r
|   serialized match {
|     case arbitrary(value,left,right) => Node(
|       value.replaceAll("\\\\\"", "\""),
|       if (left == "null") null else fromString(left),
|       if (right == "null") null else fromString(right))
|   }
| }
fromString: (serialized: String)Node

scala> println(rock.toString); println(fromString(rock.toString).toString)
Node("Dwayne \"The Rock\" Johnson",null,null)
Node("Dwayne \"The Rock\" Johnson",null,null)

scala> println(list.toString); println(fromString(list.toString).toString)
Node("My list is List(1,2,3)",null,null)
Node("My list is List(1,2,3)",null,null)

scala> println(nodes.toString); println(fromString(nodes.toString).toString)
Node("null",Node("Dwayne \"The Rock\" Johnson",null,null),Node("My list is List(1,2,3)",null,null))
Node("null",Node("Dwayne \"The Rock\" Johnson",null,null),Node("My list is List(1,2,3)",null,null))
``````

And the final test...

``````scala> val node = Node("root",Node("left",Node("left.left")),Node("right"))
node: Node = Node("root",Node("left",Node("left.left",null,null),null),Node("right",null,null))

scala> fromString(node.toString).left.left.value == "left.left"
res34: Boolean = true
``````

Nice!

### Discussion

This problem is, in essence, an exercise in string manipulation. We want to make sure that our nodes are serialized in such a way that there are no degenerate cases (where two distinct nodes are mapped to the same serialized form), and we want to make sure that the user can't throw a wrench in the deserialization process by defining a `value` with hard-to-parse characters like `"` and `,`. Since these are our delimiters, the user could very easily make deserialization much more difficult for us by including them.

Since this is essentially a string manipulation problem, and since I'm pretty experienced with regex, I decided to go in that direction. There may be a better or clearer way of solving this problem in Scala without regex, but my entire solution is only 10 lines of code:

``````case class Node (value: String, left: Node = null, right: Node = null) {
require (value != null)
override def toString = s"""Node("\${value.replaceAll("\"", "\\\\\"")}",\$left,\$right)"""
}

def fromString(serialized: String): Node = {
val arbitrary = """Node\("((?:[^\"]|\\")*)",(null|Node\(.*\)),(null|Node\(.*\))\)""".r
serialized match {
case arbitrary(value,left,right) => Node(
value.replaceAll("\\\\\"", "\""),
if (left == "null") null else fromString(left),
if (right == "null") null else fromString(right))
}
}
``````

I think my solution is clear, simple, and probably about as performant as any string manipulation problem can be, but if you have a better (or different) solution, I'd love to see it!

All the code for my Daily Coding Problems solutions is available at github.com/awwsmm/daily.

Suggestions? Let me know in the comments.

If you enjoyed this post, please consider supporting my work by buying me a coffee!