Patrick Ecker

Posted on May 15, 2021

Tagged Unions and ReScript Variants

#rescript #typescript #javascript

In JavaScript there are many situations where we want to express certain shapes of an object based on the conditions of its attributes, e.g.

// Plain JS - Typical Redux Action types
if(action.type === "addUser") {
  const user = action.user;
  createUser(user);
}

if(action.type === "removeUser") {
  const userId = action.userId;
  removeUser(userId);
}

You can find this pattern in many other scenarios, such as representing the method of a request (req.method === "POST" -> req.body != null), representing UI state (userReq.isLoading -> userReq.name == undefined), or even error state (result.err != null -> result.msg != undefined). The shape of the object is different, depending on the state of attributes defined by a specific ruleset.

In TypeScript, we'd use a so called Discriminated Union Type (Tagged Unions) to be able to encode the conditional object shape within the type itself. For our previous example, we would define a type for a user action like this:

// TypeScript

type AddUser = {
  type: "addUser",
  user: User
};

type RemoveUser = {
  type: "removeUser",
  userId: string
};

type UserAction = AddUser | RemoveUser;

As a ReScript developer, you probably had troubles writing FFI (interop) code to represent such Tagged Unions. How are we able to handle these data structures without changing the JS representation?

Usually we'd define a variant for representing different shapes of data, but unfortunately variants do not compile to the same shape of user defined Tagged Unions.

This article demonstrates in a practical example how we'd map data structures for RichText data (designed as a Tagged Union) to ReScript variants.

Important: We'll only discuss mapping ReScript variants to immutable JS values, since mutations to the original values will eventually not be reflected in the variants at runtime. Handling mutable data requires a different strategy which is not covered in this post.

Background on the Use-Case

This post is based on a a real-world use-case where I needed to represent Storyblok CMS' RichText data structures within ReScript but couldn't find any proper documentation on how to do this.

I tried to keep the data model simple to only capture the basic concepts. For a more thorough side-by-side implementation of a TS / ReScript Storyblok RichText model, including rendering logic, you can check this repository later on.

Design RichText Data with TypeScript

To kick things off, we'll define some basic RichText elements we want to be able to represent: Text, Paragraph and Doc. These will be defined as a Tagged Union called RichText:

interface Text {
  type: "text";
  text: string;
}

interface Paragraph {
  type: "paragraph";
  content: RichText[];
}

interface Doc {
  type: "doc";
  content: RichText[];
}

export type RichText =
  | Doc
  | Text
  | Paragraph;

Each case of the RichText type listed above has one common attribute type, which helps the type system to differentiate the shape of a given value by checking value.type, e.g. via an if or switch statement. Let's see that in action:

// Recursively iterate through the RichText tree and print all Text.text contents
function printTexts(input: RichText) {
  switch(input.type) { 
    case "doc": 
    case "paragraph":
      return input.content.forEach(printTexts);
    case "text": {
        console.log(input.text);
        break;
      }
  };
}

const input: RichText =   {
    type: "doc",
    content: [
      {
        type: "paragraph",
        content: [{type: "text", "text": "text 1"}]
      },
      {
        type: "paragraph",
        content: [{type: "text", "text": "text 2"}]
      }
    ]
  };

printTexts(input);

TypeScript will be able to infer the relevant data for each case correctly most of the time.

There's a few things I personally dislike in TS when handling Tagged Unions (especially via switch statements):

switch statements are not expressions (can't return a value without wrapping a function around it)
cases need extra braces to prevent variable hoisting and need a break / return statement to prevent case fall-through
Without any return statements or other trickery, TS apparently does not do any exhaustive checks within switches
Discriminated union types are really noisy in type space code and I often had a hard time navigating / writing types, even in smaller codebases
switch statements can only match one value at once. More complex discriminants / multiple discriminants are impractical
object types are structurally typed and TS will not always automatically infer the type correctly without type annotation (as seen in the const input declaration above). Error messages are generally harder to read because of that.

... but these are all just opinions.

In the next step, let's discover how we'd represent that data model in ReScript.

Representing Tagged Unions in ReScript

We now have an existing RichText representation, and we want to write ReScript FFI (interop) code to represent the same data without changing the JS parts.

ReScript's type system can't express Tagged Unions in the same way as TypeScript does, so let's take a step back:

The core idea of Tagged Unions is to express a "A or B or C" relation and to access different data, depending on what branch we are currently handling. This is exactly what ReScript Variants are made for.

So let's design the previous example with the help of variants. We will start defining our type model within our RichText.res module:

// RichText.res

module Text = {
  type t = {text: string};
};

type t;

type case =
  | Doc(array<t>)
  | Text(Text.t)
  | Paragraph(array<t>)
  | Unknown(t);

As you can see, there's no much going on here. Let's go through it really quick:

We defined a submodule Text, with a type t representing a Text RichText element. We refer to this type via Text.t.
type t; is representing our actual Tagged Union RichText element. It doesn't have any concrete shape, which makes it an "abstract type". We'll also call this type RichText.t later on.
Lastly we defined our case variant, describing all the different cases as defined by the Tagged Union in TS. Note how we also added an Unknown(t) case, to be able to represent malformed / unknown RichText elements as well

With these types we can fully represent our data model, but we still need to classify incoming JS data to our specific cases. Just for a quick reminder: The RichText.t type internally represents a JS object with following shape:

{
   type: string,
   content?: ..., // exists if type = "doc" | "paragraph"
   text?: ...,    // exists if type = "text"
}

Let's add some more functionality to reflect on that logic.

Classifying RichText.t data

We will extend our RichText.res module with the following functions:

// RichText.res

module Text = {
  type t = {text: string};
};

type t;

type case =
  | Doc(array<t>)
  | Text(Text.t)
  | Paragraph(array<t>)
  | Unknown(t);

let getType: t => string = %raw(`
    function(value) {
      if(typeof value === "object" && value.type != null) {
        return value.type;
      }
      return "unknown";
    }`)

let getContent: t => array<t> = %raw(`
    function(value) {
      if(typeof value === "object" && value.content != null) 
      {
        return value.content;
      }
      return [];
    }`)

let classify = (v: t): case =>
  switch v->getType {
  | "doc" => Doc(v->getContent)
  | "text" => Text(v->Obj.magic)
  | "paragraph" => Paragraph(v->getContent)
  | "unknown"
  | _ => Unknown(v)
  };

The code above shows everything we need to handle incoming RichText.t values.

Since we are internally handling a JS object and needed access to the type and content attributes, we defined two unsafe raw functions getType and getContent. Both functions receive a RichText.t value to extract the appropriate attribute (while making sure our data is correctly shaped, otherwise we will end up with an Unknown value).

Now with those two functions in place, we are able to define the classify function to refine our RichText.t into case values. It first retrieves the type of the input v and returns the appropriate variant constructor (with the correct payload). Since this code uses raw functions and relies on Obj.magic, it is considered to be unsafe code. For this particular scenario, the unsafe code is at least isolated in the RichText module (make sure to write tests!).

Note: You might have noticed that we store the content part of a "doc" object directly in the Doc(array<t>) variant constructor. Since we know that our Doc model does not contain any other information, we went ahead and made our model simpler instead.

Using the RichText module

Now with the implementation in place, let's showcase how we'd iterate over RichText data and print every Text content within all paragraphs:

// MyApp.res

// We simulate some JS object coming into our system
// ready to be parsed
let input: RichText.t = %raw(`
  {
    type: "doc",
    content: [
      {
        type: "paragraph",
        content: [{type: "text", "text": "text 1"}]
      },
      {
        type: "paragraph",
        content: [{type: "text", "text": "text 2"}]
      }
    ]
  }`)

// keyword rec means that this function is recursive
let rec printTexts = (input: RichText.t) => {
  switch (RichText.classify(input)) {
  | Doc(content)
  | Paragraph(content) => Belt.Array.forEach(content, printTexts)
  | Text({text}) => Js.log(text)
  | Unknown(value) => Js.log2("Unknown value found: ", value)
  };
};

printTexts(input);

As you can see in the printTexts function above, we call the function RichText.classify on the input parameter, for the Doc | Paragraph branch we can safely unify the content payload (which both are of type array<RichText.t>) and recursively call the printTexts function again. In case of a Text element, we can deeply access the record attribute RichText.Text.text, and for every other Unknown case, we directly log the value of type RichText.t, which is the original JS object (Js.log is able to log any value, no matter which type).

In contrast to the TS switch statement, let's talk about the control flow structures here (namely the ReScript switch statement):

A switch is an expression. The last statement of each branch is the return value. You can even assign it to a binding (let myValue = switch("test") {...})
Each branch must return the same type (forces simpler designs)

The most important part is, that we have the full power of Pattern Matching, which can be performed on any ReScript data structure (numbers, records, variants, tuples,...). Here is just one small example:

switch (RichText.classify(input)) {
| Doc([]) => Js.log("This document is empty")
| Doc(content) => Belt.Array.forEach(content, printTexts)
| Text({text: "text 1"}) => Js.log("We ignore 'text 1'")
| Text({text}) => Js.log("Text we accept: " ++ text)
| _ => () /* "Do nothing" */
};

Doc([]): "Match on all Doc elements with 0 elements in its content
Doc(content): "For every other content (> 0) do the following..."
Text({text: "text 1"}): "Match on all Text elements where element.text = 'text 1'"
Text({text}): "For every other Text element with a different text do the following ..."
_ => (): "For everything else _ do nothing ()"

Extending the RichText data model

Whenever we want to extend our data model, we just add a new variant constructor to our case variant, and add a new pattern match within our classify function. E.g.

type case =
  | Doc(array<t>)
  | Text(Text.t)
  | Paragraph(array<t>)
  | BulletList(array<t>) // <-- add a constructor here!
  | Unknown(t);

let classify = (v: t): case =>
  switch (v->getType) {
  | "doc" => Doc(v->getContent)
  | "text" => Text(v->Obj.magic)
  | "paragraph" => Paragraph(v->getContent)
  | "bullet_list" => BulletList(v->getContent) // <-- add a case here!
  | "unknown"
  | _ => Unknown(v)
  };

It's that easy.

Note on Runtime Overhead

It's worth noting that our RichText module approach introduces following overhead:

Variants with payloads are represented as arrays, so every classify will create a new array with the variant content inside (also the extra classify call.
Our getContent and getType function does extra checks on the structure of each input value.

Please note that the ReScript Compiler team is currently investigating in a better runtime representation for variants to be able to map more seamlessly to JS and improve performance in the future.

Note on Recursion

I am aware that the examples used in this article are not stack-safe. This means that you can practically blow your call stack when there are enough deep recursive calls. There's ways to optimize the examples to be stack-safe, just be aware that I tried to keep it simple.

Conclusion

We started out by defining a very simple version of (Storyblok based) RichText data structures in TypeScript and highlighted some aspects of Discriminated Unions / Tagged Unions.

Later on we created FFI code wrapping variants around the same RichText data structures. We created a RichText.res module, defined a data model with a cases variant and a classify function to be able to parse incoming data. We used pattern matching to access the data in a very ergonomic way.

We only scratched the surface here. I hope this article gave you an idea on how to design your own ReScript modules to tackle similar problems!

In case you are interested in more ReScript related topics, make sure to follow me on twitter.

Special thanks to hesxenon and cristianoc for the extensive technical reviews and discussions!

Top comments (1)

Sergey Samokhov • May 15 '21

Thank you, Patrick, for this down-to-earth writeup. Nice balance of safety and practicality.

Without any return statements or other trickery, TS apparently does not do any exhaustive checks within switches

Yeah, the shortest way I've found was an IIFE and a type annotation, i.e.:

let foo: Foo = (() => switch (expr.kind) {
  case "doc": return ...;
  case "text": return ...;
  case "paragraph": return ...;
})();

Workable, but much noisier than ReScript. And the lack of nested patterns is no fun. So that's 1 for ReScript :)

DEV Community