Jaakko Pallari

Posted on Jun 15, 2020 • Originally published at lepovirta.org on May 31, 2016

Ways to pattern match generic types in Scala

#scala #shapeless #functional

Occasionally in Scala, there becomes a need to pattern match on values where the type information is lost. However, when the value you want to extract contains a type parameter or is a generic type itself, the solution is not so straightforward anymore.

For example, if you know the specific type of the value you want to extract, it’s easy to come up with a solution:

def extractString(a: Any): Option[String] = a match {
  case s: String => Some(s)
  case _ => None
}

But when you try to match on a generic type, type erasure in the JVM prevents you from performing comparison at runtime. This is because generic types only exist in the compilation phase, and not during runtime. Scala’s generic types behave the same way, which is why List[String] is actually just a List in runtime. Because all lists have the same type regardless of their type parameter, it’s impossible to distinguish List[String] from List[Int] by it’s runtime type information only.

In this article, I demonstrate a few solutions that can be used when pattern matching against generic types. First, I’ll demonstrate two ways to avoid the problem altogether. After that, I’ll show how Shapeless features can be used for solving the problem. Finally, I’ll show how to solve the problem using Scala’s type tags.

Contents briefly:

Avoid losing the generic type
Avoid matching on generic type parameters
Introducing Typeable and TypeCase
Type tags
Conclusions

Avoid losing the generic type

Before you try to pattern match on generic types, try to figure out if you actually need it at all. If you have the control of the code that requires pattern matching, try to structure the code in a manner that doesn’t lose generic type information. Here’s an example of a structure that will encounter problems in pattern matching:

sealed trait Result
case class Ok[T](values: List[T]) extends Result
case class Fail(message: String) extends Result

def handleResult(result: Result): Unit = result match {
  case Fail(message) => println("ERROR: " + message)
  case Ok(vs: List[String]) => println("Got strings of total length: " + vs.map(_.size).sum)
  case _ => println("Got something else")
}

handleResult(Fail("something happened")) // output: "ERROR: something happened"
handleResult(Ok(List("this", "works")))  // output: "Got strings of total length: 9
hanldeResult(Ok(List("doesn't", "work", 4))) // ClassCastException... oops

In the code above, notice the ClassCastException we get at runtime while attempting to pass in a list of mixed type objects. An exception like this can be fatal in a live system, but it’s easy enough to write by mistake. Ideally we could verify that these kind of errors are caught during compile time rather during runtime.

Since the Result type didn’t enforce any restrictions on the type parameter of the Ok type, the type parameter information is lost. We can add the type parameter to the Result type to allow us to better manage the types:

sealed trait Result[+T]
case class Ok[+T](values: List[T]) extends Result[T]
case class Fail(message: String) extends Result[Nothing]

def handleResult(result: Result[String]): Unit = result match {
  case Fail(message) => println("ERROR: " + message)
  case Ok(vs) => println("Got strings of total length: " + vs.map(_.size).sum)
}

handleResult(Fail("something happened")) // output: "ERROR: something happened"
handleResult(Ok(List("this", "works")))  // output: "Got strings of total length: 9
handleResult(Ok(List("doesn't", "work", 4))) // compile time error

Now that the type parameter is included in the Result type, we can enforce boundaries to the type of the list included in the Ok type at compile time. This let’s us avoid having to try to regain the generic parameter completely.

Making the type parameter covariant allows us to pass the Fail value to the handleResult function. Covariance in Result allows passing values of type Result[T] where T is a subtype of String to the handleResult function. Because Nothing is subtype of every other type (including String), Fail values - which are of type Result[Nothing] - can be passed as a parameter to handleResult. This allows us to keep types such as Fail type parameterless.

Avoid matching on generic type parameters

Sometimes you can’t control what type of value gets passed to the function you’re implementing. For example, the actors in Akka framework are forced to handle all types (type Any) of messages. Since you can’t add boundaries to the incoming values at compile time, you will again lose the ability to distinguish between a List[String] and List[Int].

def handle(a: Any): Unit = a match {
  case vs: List[String] => println("strings: " + vs.map(_.size).sum)
  case vs: List[Int]    => println("ints: " + vs.sum)
  case _ =>
}

handle(List("hello", "world")) // output: "strings: 10"
handle(List(1, 2, 3))          // ClassCastException... oh no!

In the code above, if we attempt to pass a list of integers to the handle function, the function will attempt to interpret the list as a string list, which will end in a ClassCastException as we try to access the list as a string list. As explained in the earlier section, we’d like to weed out these unexpected failure cases during compile time.

If you can control the type of the values passed into the function (e.g. you can control what type of messages you sent to your actor), you can avoid the problem by boxing the input which has a type parameter with a container that specifies the type parameter:

case class Strings(values: List[String])
case class Ints(values: List[Int])

def handle(a: Any): Unit = a match {
  case Strings(vs) => println("strings: " + vs.map(_.size).sum)
  case Ints(vs)    => println("ints: " + vs.sum)
  case _ =>
}

handle(Strings(List("hello", "world"))) // output: "strings: 10"
handle(Ints(List(1, 2, 3)))             // output: "ints: 6"
handle(Strings(List("foo", "bar", 4)))  // compile time error

In the example, we define concrete types Strings and Ints to manage the generic types for us. They ensure that you cannot build lists of mixed values, so that the handle function can safely use their lists without having to know the type parameters of the lists at runtime.

Introducing Typeable and TypeCase

Shapeless provides handy tools for dealing with type safe casting: Typeable and TypeCase.

Typeable is a type class that provides the ability to cast values from Any type to a specific type. The result of the casting operation is an Option where the Some value will contain the successfully casted value, and the None value represents a cast failure. Shapeless provides the Typeable capability for all Scala’s primitive types, case classes, sealed traits hierarchies, and at least some of the Scala’s collection types and basic classes out-of-the-box. Here are some examples of Typeable in action:

import shapeless._

case class Person(name: String, age: Int, wage: Double)

val stringTypeable = Typeable[String]
val personTypeable = Typeable[Person]

stringTypeable.cast("foo": Any) // result: Some("foo")
stringTypeable.cast(1: Any)     // result: None
personTypeable.cast(Person("John", 40, 30000.0): Any) // result Some(...)
personTypeable.cast("John": Any) // result: None

TypeCase bridges Typeable and pattern matching. It’s essentially an extractor for Typeable instances. TypeCase and Typeable allow implementing the example in the previous section without boxing:

import shapeless._

val stringList = TypeCase[List[String]]
val intList    = TypeCase[List[Int]]

def handle(a: Any): Unit = a match {
  case stringList(vs) => println("strings: " + vs.map(_.size).sum)
  case intList(vs)    => println("ints: " + vs.sum)
  case _ =>
}

val ints: List[Int] = Nil

handle(List("hello", "world")) // output: "strings: 10" so far so good
handle(List(1, 2, 3))          // output: "ints: 6" yay!
handle(ints)                   // output: "strings: 0" wait... what? We'll get back to this.

Instead of boxing the list values, the TypeCase instances can be used for pattern matching on the input. TypeCase will automatically use any Typeable instance it can find for the the given type to perform the casting operation. If the casting operation fails (produces None), the pattern isn’t matched, and the next pattern is tried.

Keeping the generic type “generic”

While Typeable can be used for pattern matching on specific types, its true power is the ability to pattern match on types where the type parameter of the extracted type is kept generic. Here is an example use of this ability:

import shapeless._

def extractCollection[T: Typeable](a: Any): Option[Iterable[T]] = {
  val list = TypeCase[List[T]]
  val set  = TypeCase[Set[T]]
  a match {
    case list(l) => Some(l)
    case set(s)  => Some(s)
    case _       => None
  }
}

val l1: Any = List(1, 2, 3)
val l2: Any = List[Int]()
val s:  Any = Set(1, 2, 3)

extractCollection[Int](l1)    // Some(List(1, 2, 3))
extractCollection[Int](s)     // Some(Set(1, 2, 3))
extractCollection[String](l1) // None
extractCollection[String](s)  // None
extractCollection[String](l2) // Some(List()) // Shouldn't this be None? We'll get back to this.

In this example, we’ve created function extractCollection to extract all lists and sets of any generic type. We declare that the function type parameter T should have a Typeable instance when we call the function. The presence of the instance allows us to create extractors for a list of T and a set of T. We can then use these extractors to extract only values that conform to the types List[T] or Set[T]. For all the other values, we produce no value.

In order to use the function, we need to give a hint to the compiler what type of values we want to extract. This is done by specifying the type of extracted values as the type parameter for the function.

By keeping the extracted type generic, we’ve successfully separated part of the extraction logic from the type that we want to extract. This allows us to reuse the same function for all types that have a Typeable instance.

Typeable’s secret sauce

While Typeable may look like it has what it takes to solve type erasure, it’s still subject to the same behaviour as any other runtime code. This can be seen in the last lines of the previous code examples where empty lists were recognized as string lists even when they were specified to be integer lists. This is because Typeable casts are based on the values of the list. If the list is empty, then naturally that is a valid string list and a valid integer list (or any other list for that matter). Depending on the use-case, the distinction between different types of empty lists might or might not matter, but it can certainly catch the user off guard, if they’re not familiar with how Typeable operates.

Moreover, since Typeable inspects the values of a collection to determine whether the collection can be cast or not, it will take longer to cast a large collection than a small one. Let’s do some rudimentary profiling to see how the size of the collection affects casting.

import shapeless._

def time[T](f: => T): T = {
  val t0 = System.currentTimeMillis()
  val result = f
  val t1 = System.currentTimeMillis()
  println(s"Elapsed time: ${t1 - t0} ms")
  result
}

val list1 = (1 to 100).toList
val list2 = (1 to 1000).toList
val list3 = (1 to 10000).toList
val list4 = (1 to 100000).toList
val list5 = (1 to 1000000).toList
val list6 = (1 to 10000000).toList

val listTypeable = Typeable[List[Int]]
time { listTypeable.cast(list1: Any) } // 0 ms
time { listTypeable.cast(list2: Any) } // 1 ms
time { listTypeable.cast(list3: Any) } // 5 ms
time { listTypeable.cast(list4: Any) } // 4 ms
time { listTypeable.cast(list5: Any) } // 7 ms
time { listTypeable.cast(list6: Any) } // 70 ms

In the above example, we created lists of various sizes, and measured how long it took to cast them. Notice how the time required increases as the size of the collection expands.

Custom type class instance for Typeable

Occasionally you might encounter a type that you can’t automatically use Typeable against. These are usually standard classes (as opposed to case classes) that have type parameters. For example, you can’t automatically use Typeable on the following type:

class Funky[A, B](val foo: A, val bar: B) {
  override def toString: String = s"Funky($foo, $bar)"
}

Instead, we have to provide our own custom Typeable instance to allow casting Funky values. The Typeable interface requires us to implement two methods: cast and describe. The cast method does the real casting work, while the describe provides human readable information about the type being cast.

implicit def funkyIsTypeable[A: Typeable, B: Typeable]: Typeable[Funky[A, B]] =
  new Typeable[Funky[A, B]] {
    private val typA = Typeable[A]
    private val typB = Typeable[B]

    def cast(t: Any): Option[Funky[A, B]] = {
      if (t == null) None
      else if (t.isInstanceOf[Funky[_, _]]) {
        val o = t.asInstanceOf[Funky[_, _]]
        for {
          _ <- typA.cast(o.foo)
          _ <- typB.cast(o.bar)
        } yield o.asInstanceOf[Funky[A, B]]
      } else None
    }

    def describe: String = s"Funky[${typA.describe}, ${typB.describe}]"
  }

The type class instance is parametrized with Typeable instances for the Funky class’s type parameters. This allows us to use the instance for all Funky types where the type parameters are also Typeable.

With the help of the instance parameters, we can create a cast method that attempts to cast the given value to a Funky[A, B] when it can also cast the values inside Funky.

As we can see from the example, even with two type parameters and two fields, the casting process is already complex. Adding more fields and type parameters requires even more casting steps. Moreover, the casting is not enforced by the compiler (e.g. you can easily miss a casting step for a field), which means that it’s exposed to casting failures.

Type tags

Another way do typesafe casting is to use Scala’s type tags. A type tag is a full type description of a Scala type as a runtime value generated at compile time. Similar to Shapeless, type tags are also materialized through a type class. The type class provides the ability to access the generic type parameter’s type information during runtime.

Unlike in Shapeless, the casting is not based on checking the elements inside the class. It’s instead based on comparing type tags together. A type tag can tell us whether the type of a type tag conforms to the type of another type tag. Type tags allow checking if the types are equal or if there is a subtype relationship.

import scala.reflect.runtime.universe._

def handle[A: TypeTag](a: A): Unit =
  typeOf[A] match {
    case t if t =:= typeOf[List[String]] =>
      // list is a string list
      val r = a.asInstanceOf[List[String]].map(_.length).sum
      println("strings: " + r)

    case t if t =:= typeOf[List[Int]] =>
      // list is an int list
      val r = a.asInstanceOf[List[Int]].sum
      println("ints: " + r)

    case _ => // ignore rest
  }

val ints: List[Int] = Nil

handle(List("hello", "world")) // output: "strings: 10"
handle(List(1, 2, 3))          // output: "ints: 6"
handle(ints)                   // output: "ints: 0" it works!

In the example, we implement the old familiar handle function from previous examples with the help of type tags. We declare the input to have a type tag, and then compare the type to some existing known types: string lists and integer lists. If there’s a match between the types, we can safely perform a cast using asInstanceOf. Since the type parameter is matched by type, an empty list of integers will be recognized as list of integers instead of list of strings.

Type tags and unknown types

The downside of the approach shown in the previous example is that we must provide a type tag to perform the type matching with. In the example, we rely on the type tag instance provided to the function to recover the type information for the generic type. In many cases, the function signature could be limited to just a Any => Unit without any type tag information. One way to get around this problem is to provide the type tag information as part of the type.

import scala.reflect.runtime.universe._

class Funky[A, B](val foo: A, val bar: B) {
  override def toString: String = s"Funky($foo, $bar)"
}

final case class FunkyCollection[A: TypeTag, B: TypeTag](funkySeq: Seq[Funky[A, B]]) { // Why final? We'll get back to this.
  val selfTypeTag = typeTag[FunkyCollection[A, B]]

  def hasType[Other: TypeTag]: Boolean =
    typeOf[Other] =:= selfTypeTag.tpe

  def cast[Other: TypeTag]: Option[Other] =
    if (hasType[Other])
      Some(this.asInstanceOf[Other])
    else
      None
}

val a: FunkyCollection[String, Int] = FunkyCollection(Seq(new Funky("foo", 2)))
val b: FunkyCollection[_, _] = a

b.hasType[FunkyCollection[String, Int]] // true
b.hasType[FunkyCollection[Int, String]] // false
b.cast[FunkyCollection[String, Int]]    // Some(a)
b.cast[FunkyCollection[Int, String]]    // None

In this example, we’ve created a wrapper to a collection of Funky objects from the last example. Besides a sequence of Funky objects, the construction of FunkyCollection requires type tags for the type parameters as part of the construction. The type tags are then used to materialize a type tag for the FunkyCollection itself which can be used for comparing against other types.

The cast method is used for performing a type safe cast based on the relationship of the given type and the type tag stored in the object. If the types match, we can cast the object using asInstanceOf.

Unfortunately, it’s possible to accidentally get the wrong type tag in your object. If another class extends FunkyCollection, the type tag will remain the same in that class. Thus all hasType and cast comparisons will be made against FunkyCollection rather than the type extending FunkyCollection. This can be prevented by overriding the selfTypeTag to use the type tag for the extending class. However, in doing so, a hidden requirement is given to which ever class extends FunkyCollection, thus it increases the potential for introducing new bugs. Therefore, you may want to seal FunkyCollection from extensions using final keyword to prevent the problem.

We can also create an extractor based on the cast method.

import scala.reflect.runtime.universe._

object FunkyCollection {
  def extractor[A: TypeTag, B: TypeTag] = new FunkyExtractor[A, B]
}

class FunkyExtractor[A: TypeTag, B: TypeTag] {
  def unapply(a: Any): Option[FunkyCollection[A, B]] = a match {
    case kvs: FunkyCollection[_, _] => kvs.cast[FunkyCollection[A, B]]
    case _ => None
  }
}

val stringIntExt = FunkyCollection.extractor[String, Int]
val a: FunkyCollection[String, Int] = FunkyCollection(Seq(new Funky("foo", 2)))
val b: FunkyCollection[_, _] = a

b match {
  case stringIntExt(collection) =>
    // `collection` has type `FunkyCollection[String, Int]`
    ...

  case _ =>
    ...
}

In the example, we have a special class, FunkyExtractor, that provides the unapply method for extracting FunkyCollection values. The class is parametrized with type tags, which are used in combination with the FunkyCollection type for performing the cast operation on FunkyCollection values.

Extracting the boilerplate

The pattern for embedding a type tag and creating a cast method is pretty much the same across all types. Let’s extract those features into a trait:

trait TypeTaggedTrait[Self] { self: Self =>
  val selfTypeTag: TypeTag[Self]

  def hasType[Other: TypeTag]: Boolean =
    typeOf[Other] =:= selfTypeTag.tpe

  def cast[Other: TypeTag]: Option[Other] =
    if (hasType[Other])
      Some(this.asInstanceOf[Other])
    else
      None
}

abstract class TypeTagged[Self: TypeTag] extends TypeTaggedTrait[Self] { self: Self =>
  val selfTypeTag: TypeTag[Self] = typeTag[Self]
}

The trait TypeTaggedTrait provides the hasType and cast methods to any type extending it. The methods use the abstract field selfTypeTag to help compare types to the object’s own type. The trait’s type parameter Self is used to represent the type that extends the trait. In order to prevent using types that the trait cannot cast to as the type parameter, the trait requires the trait implementation to extend the type parameter. This is done by adding the type parameter as the self type annotation.

The selfTypeTag for the trait can be provided implicitly using an abstract class TypeTagged. By extending the TypeTagged class, classes can automatically provide the correct type tag through the type parameter.

Now that we have extracted the cast method into it’s own trait, we can create an extractor class around the trait:

class TypeTaggedExtractor[T: TypeTag] {
  def unapply(a: Any): Option[T] = a match {
    case t: TypeTaggedTrait[_] => t.cast[T]
    case _ => None
  }
}

Like the FunkyExtractor in the previous section, TypeTaggedExtractor creates an instance of an extractor for the given type. The extractor partially pattern matches on the TypeTaggedTrait, and attempts to cast to the given type if it can using the object’s cast method.

Using the trait and the extractor, we can refactor the FunkyCollection to use these generalized features:

object FunkyCollection {
  def extractor[A: TypeTag, B: TypeTag] = new TypeTaggedExtractor[FunkyCollection[A, B]]
}

final case class FunkyCollection[A: TypeTag, B: TypeTag](funkySeq: Seq[Funky[A, B]])
  extends TypeTagged[FunkyCollection[A, B]]

As mentioned in the previous FunkyCollection example, you may want to seal your classes that extend TypeTagged or TypeTaggedTrait to prevent incorrect type tags from appearing as the selfTypeTag.

Casting time relation to input size

Since the casting is based on tags instead of values, the time spent casting should remain roughly the same as input size is grown.

def toFunkyCollection(i: Seq[Int]) = FunkyCollection[String, Int] {
  i.map(v => new Funky(v.toString, v))
}

val coll1: Any = toFunkyCollection((1 to 100).toList)
val coll2: Any = toFunkyCollection((1 to 1000).toList)
val coll3: Any = toFunkyCollection((1 to 10000).toList)
val coll4: Any = toFunkyCollection((1 to 100000).toList)
val coll5: Any = toFunkyCollection((1 to 1000000).toList)
val coll6: Any = toFunkyCollection((1 to 10000000).toList)

time { extractor1.unapply(coll1) } // 1 ms
time { extractor1.unapply(coll2) } // 0 ms
time { extractor1.unapply(coll3) } // 0 ms
time { extractor1.unapply(coll4) } // 0 ms
time { extractor1.unapply(coll5) } // 0 ms
time { extractor1.unapply(coll6) } // 0 ms

Here we perform a similar kind of profiling to what we did earlier. The effects of casting can be barely seen at the millisecond scale.

However, it is only fair to point out that type tags may have thread safety or performance issues in multithreaded environments depending on what version of Scala you’re using. In Scala 2.10, type tags are not thread safe. The thread safety issues were fixed in Scala 2.11 by introducing locking in critical places of the reflection API. Because the type tags use synchronization internally, the performance of casting using type tags might be much worse than when using Typeable while casting values concurrently.

Conclusions

In this article, I demonstrated a few ways to get around type erasure when doing pattern matching in Scala. I showed two examples of how to get around the whole issue by restructuring code. I also showed how to do type safe casting using Shapeless’s Typeable and Scala’s type tags.

Refactoring the code to not rely on pattern matching provides the cleanest solution, but it is not always possible. Some of the libraries, such as Akka, provide APIs that force its users to pattern match on the Any type. When interacting with a library, the problem can usually be avoided by wrapping types that have type parameters with types that don’t have them.

An alternative approach for solving pattern matching on generic types is to use Shapeless’s Typeable or Scala’s type tags. Typeable along with TypeCase provides an easy to use API for performing type safe casting. Its casting mechanism is based on type checking values at runtime, thus it doesn’t follow all the compile time semantics of casting. Type tags provide an API for performing type checking in the runtime against types lifted into values. Type tags are not as straightforward to use as Typeable, but its type checking process is more strict. Type tags also require thread synchronization while Typeable doesn’t.

I’d like to thank Miles Sabin for pointing out issues of using type tags and everyone who participated in reviewing this article. I’ve uploaded the code examples to Github Gist to play around with. Thanks for reading!

DEV Community

Ways to pattern match generic types in Scala

Avoid losing the generic type

Avoid matching on generic type parameters

Introducing Typeable and TypeCase

Keeping the generic type “generic”

Typeable’s secret sauce

Custom type class instance for Typeable

Type tags

Type tags and unknown types

Extracting the boilerplate

Casting time relation to input size

Conclusions

Top comments (0)

Read next

New Distance Metric Makes Robots Better at Comparing Shapes and Patterns

Fast Language AI Breakthrough: New Model Generates Text All at Once, Matching Quality of Sequential Systems

New AI Model Cleans Up Multi-Channel Audio Using 87% Fewer Resources

Privacy-Preserving Graph Learning System Lets Organizations Share Insights While Keeping Data Private