DEV Community

Cover image for Tag Your Unions Before You Wreck Your Unions
K
K

Posted on

Tag Your Unions Before You Wreck Your Unions

Cover image by Paul Gorbould on Flickr.

Tagged union, discriminated union, disjoint union, variant, variant record, or sum types. Different name, similar concept, but what is it all about and how do tagged unions differ from regular ones?

Untagged Unions

If you are coming from statically typed languages like C, you probably already know about unions. A basic way to save data of different type into the same memory space. They are also called untagged unions sometimes.

An example in C could look like that

union MyUnion {
   int number;
   char text[20];
};

int main() {
   union MyUnion x;        

   x.number = 2;
   printf( "x.number: %d\n", x.number);

   strcpy( x.text, "Hello, world!");
   printf( "x.text: %s\n", x.text);

   return 0;
}
Enter fullscreen mode Exit fullscreen mode

The size of x in memory will be the biggest value that MyUnion can store. It looks a bit like a struct but if you write a value in one field it overrides the memory of the other fields. The basic idea behind this is to save space, also it makes languages like C a tiny bit more dynamic, because one variable now can store different types.

As you probably can imagine, this can also be used to save different types of structs into one memory space.

The problem with unions is, the type-checker doesn't care what you are doing.

If you declare an int x, the type-checker will throw an error if you try to put a string inside of it.

If you declare an union MyUnion x, the type-checker won't keep track of what you are storing, since it's runtime dependent, so you have to check inside of your program logic if it's okay to access x.number or x.text.

How is this realated to JavaScript?

Well, in JavaScript, you can't type your variables, which allows you to store anything in them.

let x = 2;
console.log("Number:", x);

x = "Hello, world!";
console.log("Text", x);
Enter fullscreen mode Exit fullscreen mode

This can be rather convenient, because if you data-structure changes, you still can put it inside the same variables, without caring about the types.

The problems arise when you get a bit more complex data-structures.

let x = {
  httpMethod: "GET",
  path: "/users",
  queryParams: { id: 10 }
};
console.log("ID:", x.queryParams.id);

x = {
  httpMethod: "POST",
  path: "/users",
  body: { name: "Jane" }
};
console.log("ID:", x.body.name);
Enter fullscreen mode Exit fullscreen mode

As you can see, a GET request comes with a queryParams field and a POST request comes with a body field. The path is the same, but some parts differ.

You can use the httpMethod field to check what it is, but you have to do it yourself. If you get this wrong, you could end up accessing x.body.id in a GET request and everything blows up, because x.body is undefined.

If you used JavaScript for a while, you probably noticed that basically all data is a untagged union. Most of the time you just store one type of data into a variable, but more often than not you end up pushing around objects that are kinda the same, but differ in some fields, like the request example above.

Tagged Unions

So what's the idea about tagged unions?

They let you define the differences of your unions with the help of a static type system.

What does this mean?

Like I explained with the request example, you often have a bunch of different data types, that come in one variable, like an argument of a function or something. They are basically the same, but vary in few fields or they are entirely different. If you want to be sure you don't access data that isn't there and prevent the infamous is undefined errors, you would have to check inside the program code at runtime.

Such a check could look like this:

function handle(request) {
  if (request.method == "GET") console.log(request.queryParams.id);
}
Enter fullscreen mode Exit fullscreen mode

You could also directly check the queryParams object, but nobody forces you to do so, this is completely in your hand and could fail one day in production.

Languages with tagged unions in their type-system allow you to make this check at compile time. Reason is such a language.

An example of a request type could look like this:

type body = {name: string};
type queryParams = {id: string};
type httpMethod = GET(queryParams) | POST(body);

type request = {
  path: string,
  httpMethod: httpMethod
};
Enter fullscreen mode Exit fullscreen mode

Now the data is encapsulated inside a tagged union (called variant in Reason), which is the httpMethod type at the top.

If the content of httpMethod is GET, you don't even get access to a body, which could have (and often has) an entirely different structure from queryParams.

Example of a usage could look like that:

let handleRequest = (req: request) => 
  switch (req.httpMethod) {
  | GET(query) => Js.log("GET " ++ req.path ++ " ID:" ++ query.id)
  | POST(body) => Js.log("GET " ++ req.path ++ " ID:" ++ body.name)
  };
Enter fullscreen mode Exit fullscreen mode

What does this do? It types the req argument as request. Since req.httpMethod is a variant (= tagged union), we can use switch to do things for the different types in that variant.

Many languages that have tagged unions even force you to do things for every possibility. This seems strange at first, but it can help later. If someone changes that tagged union, which can be defined somewhere else in the code, the type-checker will tell you that you need to do something for the new type in that union. This could be forgotten if done manually.

Conclusion

Tagged unions are a nice way to store different data-types inside of one variable without losing track of their structure. This allows code to be written more like in a dynamically typed language while giving it more safety in the long run.

Reason is such a language, it tries to make concepts like tagged unions, called variants in Reason, accessible for JavaScript developers while delivering it with a familiar syntax.

TypeScript has tagged unions too, if you aren't into that whole FP thingy.

Oldest comments (8)

Collapse
 
17cupsofcoffee profile image
Joe Clay

Nice article :)

Tagged unions are the number one thing I miss in most of the 'big' programming languages. Being able to represent the state of something in a way the compiler can verify for you is so nice!

Collapse
 
kayis profile image
K

Same here.

I was never a friend of statically typed languages. This feature is the first time I see real value!

Collapse
 
17cupsofcoffee profile image
Joe Clay • Edited

Yeah, I think the things that finally sold me on it were Scott Wlaschin's 'Designing With Types' blog posts and Richard Feldman's 'Making Impossible States Impossible' talk. Both well worth a read/watch if you've not already seen them :)

Thread Thread
 
kayis profile image
K

Yes, same here. One of the best resources about that topic, I think!

Collapse
 
buntine profile image
Andrew Buntine

Nice article. The Rust language also has great support for tagged unions with it's Enum construct. It also forces (at compile time) that all possibilities are catered for.

Enforcing safety as a formality in a language is a fantastic feature. It prevents the laziness that we often resort to when dealing with less strict languages.

Collapse
 
yawaramin profile image
Yawar Amin • Edited

TypeScript has untagged sum types, but it's kind of a pain to handle the cases because you have to write the runtime checks manually:

type Get = {queryParams: string};
type Put = {body: string};
type HttpMethod = Get | Put;

namespace HttpMethod {
  export function isGet(httpMethod: HttpMethod): httpMethod is Get {
    return (<Get>httpMethod).queryParams !== undefined;
  }
}

type Req = {path: string, httpMethod: HttpMethod};

function handleRequest(req: Req): void {
  if (HttpMethod.isGet(req.httpMethod))
    console.log(`GET ${req.path} params: ${req.httpMethod.queryParams}`);
  else
    console.log(`PUT ${req.path} body: ${req.httpMethod.body}`);
}

Oh, and the checks would be different if the cases were primitive types or function-constructed values.

IMO tagged unions are much quicker to handle precisely because the tags (i.e. data constructors) are first-class language entities.

Collapse
 
karfau profile image
Christian Bewernitz

You can "simplify" the type checking part by making the method part of your type definition and setting it's type to the string literal:

type Get = {
  method: 'GET';
...

Inside of an if/switch-case that checks method typescript only allows access to the proper fields.
In case you cover all cases in a switch, inside default, the type of the variable will be never.
Typescript calls this "Discriminated Unions":
typescriptlang.org/docs/handbook/a...
(There is no way to link directly to the section, you need to search for the term on the page.)