λ➕ Sum Types in relational databases: Yet another (IMO pretty neat) way to model them in SQL

#sql #postgres #database #functional

Sum types

Sum types are an incredibly useful data-modelling feature that is predominantly used in functional programming languages. For example in Elm we can write something like:

type Animal
    = Dog { name : String, age : Integer }
    | Bird { song : String }

(It's a silly example, I know...)

The type we defined here restricts the possible values of an Animal to exactly Dog or Bird with the respective attributes. Those restrictions can make it much easier to reason about code and to check it for correctness. We can extend the type by as many constructors as we need (anyone up for Cat?).

Relational models for sum types

So how do we model this kind of type in SQL? There is no perfect solution, but you can find many good suggestions here:

While modeling a sum type with several normalized tables is arguably cleaner from a relational standpoint (e.g. one central table for Animal and related ones for Dog and Bird), it comes with a lot of complexity.

Yet another way to model sum types in SQL

The solution I want to demonstrate here is the 'nullable columns' approach from the blog posts above, with a more compact formulation of the not null constraints.

Data definition

create type animal_type
    as enum
        ( 'dog'
        , 'bird'
        );

create table animals
    ( animal_id serial primary key
    , animal_type animal_type not null
    , dog_name text
        check ((animal_type = 'dog') = (dog_name is not null))
    , dog_age integer
        check ((animal_type = 'dog') = (dog_age is not null))
    , bird_song text
        check ((animal_type = 'bird') = (bird_song is not null))
    );

First, we created a new Postgres type in which we exhaustively enumerate all constructors (dog and bird in our example).

Then we define a table where the constructor is tracked in the animal_type column. The fields of the constructors are all listed in the same table. With the constraint check ((animal_type = 'dog') = (dog_name is not null)), we make sure that the column is not null when the animal_type is dog, and that it is always null otherwise - i.e. only valid combinations.

Referring to other columns in a check constraint of a column (as we do here with animal_type) is non-standard SQL and will not be portable. I opted for it in this case, as having the constraints attached to each column makes it quite obvious that they complete and correct.

Data modification

Inserting values into this table is straightforward:

insert into animals(animal_type, bird_song) values ('bird', 'lalala');
insert into animals(animal_type, dog_name, dog_age) values ('dog', 'rex', 7);

Note that the null columns will not waste any significant amount of space, as Postgres will efficiently pack each record.

The constraints we set will make sure that only valid values can be inserted or updated:

-- This will fail as we also set the `dog_name`attribute, which belongs
-- to another constructor
insert into animals(animal_type, bird_song, dog_name)
    values ('bird', 'lalala', 'rex');

-- Fails as the `dog_age` column is missing, but it's required for `dog`
insert into animals(animal_type, dog_name) values ('dog', 'rex');

Data query

Querying the table is simple:

select * from animals;

select animal_id, dog_name, dog_age from animals where animal_type = 'dog';

On the functional side, decoding the rows back into sum types is relatively simple, we can either use a oneOf construct or something like Decode.string "animal_type" |> Decode.andThen ... in Elm.

Discussion

There is no perfect way to model sum types in relational databases. Some are very complex to set up and handle, others are more pragmatic and simpler to handle, but less normalized.

The solution proposed here will not bring any joy to normalization affictionadoes, but it is easy to handle and, due to its simplicity, also performs well.

What do you think?