stereobooster

Posted on Apr 20, 2019 • Edited on Apr 22, 2019

Sets, sets are everywhere

#typescript #python #javascript

I strongly believe that it is ok to learn on the go (unless you have a life-critical profession, like a doctor of medicine or you writing software for auto-pilot). If you learn by doing you mostly get practical knowledge. Sometimes it is useful to pause and to summarise practical knowledge down to theory or study theory if this area is already well known.

In math (not always) they tend to teach theory without explaining what practical use of it.

In this post, I want to talk about the mathematical subject, which may seem as irrelevant for a developer at first glance - sets. I will explain basic ideas and how they appear in development in many different places, for example as a data structure, in types, in RDBMS.

I will give a short theoretical definition followed by a practical example. I took some of the definitions from the book The Grammar of Graphics by Leland Wilkinson.

A set

A set is a collection of unique objects. An object in a set is called an element or a member of the set. If we agree to denote sets by capital letters (for example, X) and members of the set by lower-case letters (for example x), then mathematical notation for "x is an element of X" would be

x ∈ X, where  x = 1 and X = {1}

Set as a data structure

In some programming languages set is a built-in data structure, for example in JS:

const x = 1
const X = new Set([1]);
X.has(x)

or in Python

x = 1
X = set([1])
print(x in X)

In practice, it can be used to find all unique values in a list, for example in JS

const unique = [...new Set(list)];

Null set and numbers

The null set (∅ or {}) is an empty set or set with no elements. In math, they denote set of real numbers by R (ℝ), the integers by Z (ℤ), natural numbers by N (ℕ)

Types as sets

Types can be modeled as sets (at least to some extent). We can say that type is a set of all values of this type, for example when we say that some variable is of type integer, we mean that it can take all values of the set of integer numbers Z (obviously with limitation of computer memory).

The empty set would correspond to type with no values. What can it be? It is a type of function with no return value (some people would call it procedure), for example in TypeScript:

function hello(name: string): void {
  console.log(`Hello, ${name}!`);
}

Don't be confused by the undefined in case of JavaScript or null in different languages. We say that void corresponds to the empty set, but in practice it may have one (or two in JavaScript, depending on how you treat it) value undefined/null, this is done for convenience, so that value which represents the absence of value would be possible to pass around as the result of function or variable. There are different approaches to represent the absence of a value, for example Maybe monad or Null Object Pattern.

Side note: Also in TypeScript, they have a real empty set

let x: never;
x = undefined; // error

it gets very confusing at this point, but don't be. In math, there is a straightforward definition but in JavaScript, they made an error by introducing two nulls (undefined and null). We need null for convenience otherwise we would be forced to use Maybe monad or similar solution. TypeScript is a superset of JavaScript, so it inherits its all quirks, but also because it is a modern type system they introduced proper empty set, for example, to use in an exhaustiveness checking.

Subset, superset

If every element of set A is also an element of a set B, then A is a subset of B, denoted A ⊆ B. If A is a subset of B, but there is at least one element of B that is not in A, then A is proper subset of B, denoted A ⊂ B. As well we can define the reverse version of subset, called superset and denoted ⊇ and proper superset denoted as ⊃.

In JavaScript

const Zet = require('zet');
const A = new Zet([1, 2, 3]);
const B = new Zet([3, 4, 5]);
A.subset(B)

In Python

A = {1,2,3}
B = {3,4,5}
A <= B

TypeScript as a superset of JavaScript

What do they mean when they say that TypeScript is a superset of JavaScript? They mean that if we model JavaScript and TypeScript languages as a set of all syntactically correct programs (in those languages), one of the sets would be a subset of the other one:

TypeScript ⊃ JavaScript

The Liskov Substitution Principle and subsets

In Object Oriented Design they have SOLID principles and L in the acronym stands for the Liskov Substitution Principle, which says that derived classes must be substitutable for their base classes.

We can say that class is the set of all instances of this class, then if we have class A and class B - a subclass of A:

class A {}
class B extends A {}

and they follow the Liskov Substitution Principle, we can see that set B is a subset of A because all instances of class B also considered instances of class A (from behavior point of view, nominally they are different).

Note: we just showed correspondence between sets, types (in programming) and classes.

Cardinality

If each element of A can be paired with an element of B such that each element of A occurs exactly once, and each element of B occurs exactly once, in the pairings we say that A and B are equivalent, denoted A ~ B. If a set A is equivalent to a subset of the set of integers, we say that set A is countable. If A is null or equivalent to the set {1, 2,.. m }, where m is a positive integer, we say A is a finite set. The cardinality of a finite set is m, the count of its elements.
An indexed set is set of the form {(1, a), (2, b),.. (m, n)}
(a, b) is called tuple (or ordered pair). (a, b,.. z) is called n-tuple, also sometimes people refer to n-tuples as tuples.

Relational database and DataFrames

Tables in relational databases meant to be set of tuples (set of tuples also can be called frame). Some databases ignore this idea, for example, MySQL.
This explains why it is nice to have a primary key - it is easy to guarantee the uniqueness of each tuple (no need to check all values, we can check the only one).
This also explains why it is ok that PostgreSQL returns rows in different order unless ORDER BY specified for each query - because the set doesn't provide order. MySQL, for comparison, returns rows in the order they were inserted (at least it did when I checked the last time).

The cardinality of a finite set in SQL

SELECT COUNT(*) FROM table_name;

In Python, they have pandas.DataFrame. Which can be modeled as an indexed set of tuples or as a tuple of indexed sets. It is implemented as a column-oriented store, for the mathematical model it doesn't matter, it changes performance, but not an abstraction.

Intersection and union operations

The intersection of two sets A and B, denoted by A ∩ B, is the set which elements belongs to A and B sets simultaneously:

A ∩ B = { x: x ∈ A and x ∈ B }

For example

A = {1, 2} and B = {2, 3, 4} then A ∩ B = {2}

The union of two sets A and B, denoted by A ∪ B, is the set which elements belongs to either A or B set:

A ∪ B = { x: x ∈ A or x ∈ B }

For example

A = {1, 2} and B = {2, 3, 4} then A ∪ B = {1, 2, 3, 4}

The disjoint union of two sets A and B, denoted by A ⊔ B, produces set whose members are tagged elements. A tagged element is one for the form x:$ where x ∈ X is the element and the symbol $ is the tag. A tag sometimes called an identifier or a color; it may be a string, a numerical value, or another piece of information. With a disjoint union, we tag element with the name of the set containing it. For example

A = {1, 2} and B = {2, 3, 4} then A ⊔ B = {1:A, 2:A, 2:B, 3:B, 4:B}

The (Cartesian) product of sets A and B, denoted by A × B, is set with combinations of elements from sets A and B

A × B = { (a,b): a ∈ A or b ∈ B }

For example

A = {1, 2} and B = {2, 3, 4} then A × B = {(1,2), (1,3), (1,4), (2,2), (2,3), (2,4)}

More about types

If we say there is a correspondence between sets and types there suppose to be correspondence for operations as well.

Let's use TypeScript for examples. Union:

type A = 'a' 
type B = 'b'
type C = A | B

In type systems, disjoint unions sometimes are called discriminated unions or tagged unions. For example

interface Rectangle {
    type: "rectangle";
    width: number;
    height: number;
}
interface Circle {
    type: "circle";
    radius: number;
}
type Shape = Rectangle | Circle;

in the given example type field is a tag. This kind of pattern often used in event processing and works well with pattern matching.

Before we can move on to Cartesian product we need to talk about how record types can be represented with sets

type A = { a: number };
type B = { b: number}

We can represent this as a set of tuples, where the first element in every tuple is the name of the property:

A = {(a, 1), (a, 2),.. (a,n)}
B = {(b, 1), (b, 2),.. (b,n)}

How would a Cartesian product look like?

C = A × B = {((a,1), (b,1)), ((a,1), (b,2)), ((a,2), (b,1)),.. ((a,n), (b,n)) }

As you can guess tuple of tuples represent Record with two fields:

type C = { a: number, b: number }

Then Cartesian product corresponds to "Intersection Types" (at least this is how they call it in TypeScript).

type C = A & B

PS

My idea was to introduce a bare minimum of the theory, but enough to see how it is used in real life. Tell me if it is graspable or not, especially if you don't have a CS degree.

Do you consider useful to understand a bit of theory behind those subjects? Did any of those ideas click for you?

Cover image: Construção, Ione Saldanha

Top comments (8)

Harvey Thompson • Apr 22 '19 • Edited

Great article! This sort of understanding is becoming more and more useful - quite a few new programming languages include some or all of these concepts (such as TypeScript, Scala, Swift, Kotlin, Ceylon).

I highly recommend the Types and Programming Languages by Benjamin C. Pierce. It's somewhat math heavy, if you're not into such things, you can still get a lot of the concepts out of it.

Also I think there's a couple of minor mistakes:

The code block when first introducing Cartesian product uses A ∪ B, should be A × B.
This section does not seem to be correct (perhaps it is in TypeScript, but I doubt it): "Then Cartesian product corresponds to "Intersection Types" (at least this is how they call it in TypeScript)." type C = A & B - these are two separate concepts as you described correctly earlier.

I'm developing a new programming language which has these concepts, so this kind of thing really interests me.

stereobooster • Apr 22 '19 • Edited

The code block when first introducing Cartesian product uses A ∪ B, should be A × B.

Yes it was a typo. FIxed. Thank you

This section does not seem to be correct

Why not?

I'm developing a new programming language

wow, cool. Do you have something in public already (link to a repo)?

Harvey Thompson • Apr 23 '19

At least in Set Theory cartesian product and intersection are two separate concepts: product produces tuples, intersection produces a subset of common elements.

stereobooster • Apr 23 '19

Ah, yes. Similar names - my favourite source of the confusion. Read this as following intersection types in TypeScript can be explained with or corresponds to cartesian product in sets. If you would treat "intersection types" as proper noun (not as name with meaning) you can follow the logic of the article, right?

Full picture: actually it is not coincidence that "intersection types" name collides with "intersection" operation in sets. Let's rewind a bit and see what we are doing here - we showed partial correspondence between types and sets. We didn't show full correspondence, for example how to encode lambda types (a -> b), how to encode parametric types. We just show correspondence of some ideas in sets to some ideas in types in programming language. And it is up to us how we choose correspondence, it is not necessary one to one correspondence.

So what they call "intersection types" can be explained with cartesian product (what I did, and what I have seen in some papers) or with intersection (what probably authors of TS did). If we take that type A = {a: number} corresponds to sets of all records (or objects in terms of JavaScript) which have property a as number, but not limited to this property only, for example {a: number, c: string} also counts as type A; and we take type B = {b: number}, we can see that objects of type {a: number, b: number} is of the type A and B simultaneously. We can say that type C = {a: number, b: number} is the same as A & B and in types this operation would correspond to intersection.

I called it partial correspondence, other way to name is "modeling", the same way they build mathematical model of physical phenomenon (F = GM₁M₂/r²). We modeled types in programming language with sets from math. And some models work well in one cases, and don't work in different. Models are approximations, models are platonic representation of some phenomenon.

Does it make sense now?

Harvey Thompson • Apr 24 '19 • Edited

Yes, I think so. I think you're talking about the construction of the properties of the Intersection Type C using the product operation on A and B to produce the state/record of C. This is probably valid for TypeScript (I'm guessing here, since I'm not super familiar with it).

It was confusing me because I tend to think about the type representing the set of possible instances of the types involved, so the names match: a Union Type has instances which are the union of each type's instances, similar for Intersection Type (the instances of C are by definition the intersection of A and B's instances).

For my language, at the type level, state is not directly modelled because it's hard to reason with and does not allow multiple inheritance (*). Thus when I deal with union and Intersection types the operations to implement these are always set-theoretic union and intersection (set of permissible methods, super-type, etc).

As you also mention, computation of union and intersection types (and different/not types, which I also will have because I want Flow Typing which pretty much requires all three), together with a rich type system including generics, tuples, functions and with a sound and complete sub-typing relationship makes for a complex type system.

(*) I have multiple type kinds, one of which is Class (a Type with ordered state), which is limited to single inheritance of another Class, but would eventually allow multiple inheritance of Traits (a Type without state). The state is not visible to the type system, and so reasoning about type operations is made a lot easier. A lot of languages have adopted this, but I'm guessing TypeScript (because it's on top of JavaScript) is not one of them.

NOTE: I could be talking complete nonsense because this stuff is freakingly complex, but it's working well so far.

FYI: You might like On Declarative Rewriting for Sound and Complete Union, Intersection and Negation Types - and some of the previous papers also by David J Pearce.

Harvey Thompson • Apr 24 '19

Oh I forgot to respond to your question: "Do you have something in public already (link to a repo)?" - not currently, I'm at version 0.2 so it's early years yet - ask me again in about two years ;-) (Who knew writing a programming language could be so hard!)

Jesse M. Holmes • Apr 22 '19

I haven't seen set theory in a long time, and you remind me how much I miss math. What are the most common ways that you are using it to solve problems today?

stereobooster • Apr 22 '19

Well, as I said as data structure I would use it to find unique values of the list (like a list of ids) or find the intersection of two lists, or check if the item already present in the list. Plus I use TypeScript - unions, intersections.