Darragh O'Riordan

Posted on Jun 23, 2022 • Originally published at darraghoriordan.com on Jun 13, 2022

Relational data

#beginners #tutorial #database #postgres

Lesson goal

In this lesson you’ll learn about relational data with typeorm and postgres.

We’ll also cover entity relational diagrams, running database migrations and some special relations in typeorm.

Database course index

This is part of a full course on persistence in postgres with typeorm and sql!

There is a github repo to go with this course. See part 2 for instructions.

Relations in RDBMS

An RDBMS will manage relations between entities and guarantee referential integrity for you. In other words if you say that an entity depends on another entity, the RDBMS will prevent the dependency from being deleted while something else references it.

This is incredibly powerful because it reduces cognitive load for developers. You can try to delete something and know that the database will prevent you from deleting something in use.

A relation is defined by setting a foreign key relationship. You specify that the id value in a column in one table corresponds to a key value in another table.

In the following diagram the pet.ownerid column references the primary key of the pet_owner column.

Glossary: Entity relation diagram

The diagram above is an “entity relational diagram”. This type of diagram was created way back in 1970 to describe the data model in database applications. The tables define the entities and the lines and line endings describe the relationship.

The line endings have special shapes that indicate if a side of a relationship can have many or one (and all the combinations of those). For example a pet_owner can have many pets, but a pet can only have one owner in our model.

This example of “a single pet owner per pet” is a good one because it is probably naive and wrong! - many pets will have multiple owners!

This is a good example of using diagramming to test our assumptions and work through models. We usually go through a number of iterations on these designs before writing any code.

Introducing the dog walking app

Here is a proposed ER diagram for a fictional dog walking business. It defined the pets, their owners, the walkers and of course the appointments.

We will cover this diagram in more detail as we work through the different types of joins we can do.

Glossary: Migrations

Database designs evolve over time. You will not design it correctly the first time and your application will change.

Migrations are files with sql commands that describe the steps to take a database schema from one state to another state.

Each migration is versioned and thay are run linearly such that any instance of a database can check which migrations have not yet been run and can run the missing migrations.

The guarantee is that any database that has the same migrations applied will be in the same schema state as any other database with the same migrations applied. This is as long as the migrations are the only thing changing schema state!

We have to be able to update the schema of our database consistently and safely for a given version of our application.

If you consider that a developer needs to change a database locally in isolation for their feature, then they deploy on QA and need to test their change so they must apply their migration on QA, but they do not want to update production yet because the change is not tested.

Migrations isolated these small changes from each other.

Migrations can be automatically generated by tools like typeorm that look at the current state of the database and compare it with the models a developer has changed in their code.

The steps required to have the database support the new code is the migration. This is the way “code-first migrations” work.

Typeorm is typically used this way.

The following is an example of a typeorm migration.

import { MigrationInterface, QueryRunner } from 'typeorm'

export class addmorefilteroptions1614687453606 implements MigrationInterface {
  name = 'addmorefilteroptions1614687453606'

  public async up(queryRunner: QueryRunner): Promise<void> {
    await queryRunner.query(
      `ALTER TABLE "received_offer" ADD "roleLocation" character varying NOT NULL`
    )
  }
}

It’s common to have some scripts in your code to generate and run migrations in dev

"scripts" :{
    "typeorm": "ts-node -r tsconfig-paths/register ./node_modules/typeorm/cli.js",
    "typeorm:migration:generate": "npm run typeorm -- migration:generate ./src/migrations/ormmigration -d ./src/database-connection/appDatasource",
    "typeorm:migration:run": "npm run typeorm -- migration:run -d ./src/database-connection/appDatasource",
    "typeorm:migration:revert": "npm run typeorm -- migration:revert -d ./src/database-connection/appDatasource"
}

Note that typeorm will also generate a “down” script for you to revert the changes introduced in a migration. This should only be used in your local development environment.

In any shared environment like QA or production you should instead create a new migration where applying it using up reverts the change you need to revert.

Running migrations

In production applications we generally use migrations to manage schema change in RDBMS. There are migrations for the lesson 4 full structure for this project already prepared in src/migrations/l4-migration.ts.

Before moving on, please clear your database server of work from previous lessons by running

yarn destroy

Then you can run this migration with

yarn typeorm:migration:run

Then run the migration as described above. This will give us all the same database schema.

We will need all the tables later for the relations examples.

Glossary: RDBMS Indexes

Indexes are lists of pointers to data in our table that the database maintains for us. Just like the index of a book they are used by the database to look up a record very quicly .

In a database if we can lookup using an index then our query will be significantly faster than one without. A lookup on a database table that doesn’t have an index on the properties we are matcing on is called a “scan”. Every type of database should be designed to reduce table scans.

We don’t put indexes on every column because everytime we update or insert a record we have to update the indexes. This is one of the featurs of RDBMS that make them slower than something like dynamoDB.

You should add indexes to columns that are used for reading data. This depends on your access patters. For example if you always lookup users by email address, you might want to consider adding an index on the email_address column for users.

RELATIONS: One to many / many to one

A many to one relation is very common in most applications. In this diagram we’re saying that an owner can have many pets. But a pet only has one owner.

There are some things that I recommend always doing that slightly differ from the typeorm docs.

I always specify a join column manually so I have control over the name
I add an index to the join column
I generally always define both sides of the relationship even though this isn’t required

// note - other properties are removed in this code snippet

// the side that can have many items
@Entity()
export class PetOwner {
  @OneToMany(() => Pet, (pet) => pet.owner)
  pets!: Pet[]
}

// the side that has a single item
@Entity()
export class Pet {
  @Column()
  @Index()
  owner_id!: number

  @ManyToOne(() => PetOwner, (owner) => owner.pets)
  @JoinColumn({ name: 'owner_id', referencedColumnName: 'id' })
  owner!: PetOwner
}

GLOSSARY: Typeorm cascade saves and updates

When setting relations on an entity you can turn on the cascade option, and persistance options and typeorm will automatically save the relations in your code model to the database. Note that this is not the same as Postgres’ CASCADE, it is a typeorm feature.

  @ManyToOne(() => PetOwner, (owner) => owner.pets, {cascase: true})
  @JoinColumn({ name: 'owner_id', referencedColumnName: 'id' })
  owner!: PetOwner

I don’t like this automatic saving, I prefer to save each part manually even though it’s a bit more tedious. So you can see in all my relation decorators I do not set cascade.

If you’re saving multiple related things then it’s probably better to wrap a set of related inserts in a transaction but i’ll leave transactions out of the code for now.

RELATIONS: One to one

A one to one relationship is similar in typeorm but you only define one side of the relationship.

It’s important to remember that you must explicitly set which table contains the join column.

Consider the following ER diagram where each pet has their own profile.

To add this to our pet model we would add a one to one column and specify the join column to tell typeorm which table should have the join column.

// note - other properties are removed in this code snippet

// pet with a one to one profile relation
@Entity()
export class Pet {
  @OneToOne(() => PetProfile)
  @JoinColumn({ name: 'profile_id', referencedColumnName: 'id' })
  profile!: PetProfile
}

RELATIONS: Many to many

Many to many type relations are also very common in applications. However many to many relations are not explicitly supported in most RDBMS.

A very common solution to this problem is to use “cross-tables”. So we have to use an additional join table to achieve the result we need. This is an example of where typeorm shines because it abstracts away these cross-join table(s) for us.

Consider if we want to add emergency contacts to our pet system. Emergency contacts like vets might be shared by many pets, pets might have multiple emergency contacts in a family. And animals will very likely have multiple vets as options.

If we were modelling this in a relationship diagram it would look like

The “crows foot” indicates many in an ERD diagram. However the database columns for Ids can only contain one number so the actual implementation for this will be different.

We must use a third table to store the many-to-many relation in RDBMS.

See this implementation diagram for example

If we wanted to add these relationships in typeorm we would add a @JoinTable() decorator.

// note - other properties are removed in this code snippet

// on one side of the relationship
@Entity()
export class Pet {
  @ManyToMany(
    () => EmergencyContact,
    (emergencyContact) => emergencyContact.pets
  )
  emergencyContacts!: EmergencyContact[]
}

// the other side
@Entity()
export class Vet {
  @ManyToMany(() => Pet, (pet) => pet.emergencyContacts)
  @JoinTable({
    name: 'emergency_contact_pet',
    joinColumn: { name: 'petId', referencedColumnName: 'id' },
    inverseJoinColumn: { name: 'vetId', referencedColumnName: 'id' },
  })
  pets!: Pet[]
}

When saving many-to-many entities you should save the items on both sides first as required, then set the relation using an array like a one to many relationship.

Saving relations

To save relations you must first create and save the entities on each side of the relation. Then you assign the entity to relevant properties and save the relation.

// create some pets
const mikeCatProfile = petProfileRepository.create({
  isHouseTrained: false,
  name: 'mikes cat',
  type: 'cat',
  breed: 'shorthair',
  instagramUrl: 'someurl',
})
const mikeDogProfile = petProfileRepository.create({
  isHouseTrained: true,
  name: 'mikes doggo',
  type: 'dog',
  breed: 'pom',
})
await petProfileRepository.save([mikeCatProfile, mikeDogProfile])

// create and save a new pet owner
const mikePetOwner = petOwnerRepository.create()
mikePetOwner.name = mikeName
mikePetOwner.address = 'some address'
mikePetOwner.homePhone = '12345'
mikePetOwner.mobile = '54321'

await petOwnerRepository.save(mikePetOwner)

// save the pets
const mikePets = petRepository.create([
  { owner: mikePetOwner, profile: mikeDogProfile },
  { owner: mikePetOwner, profile: mikeCatProfile },
])

const result = await petRepository.save(mikePets)

selecting relations (joins)

SQL joins have already been described by experts better than I could do again here. Here are some nice resources for reference.

I use this diagram as a reference when I can’t remember which one I need.

sql joins

Jeff Atwood wrote a great article with examples of joins back in 2007 - https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/

When we’re accessing a relation in our code model, we usually have the left-hand-side of the relation already. This means that want to do left joins when selecting the relations (get the relations for the current item, or null).

Because of this, typeorm provides a shortcut to left join the relation. The following will get all pet owners, and their pets if they have any.

const users = await dataSource.getRepository(PetOwner).find({
  relations: {
    pets: true,
  },
})

An alternative way to select relations in typeorm is to use a query builder. This also allows us to do an inner join (only get pet owners that actually have pets).

// this is identical to the above query
await dataSource
  .getRepository(PetOwner)
  .createQueryBuilder('petOwners')
  .leftJoinAndSelect('petOwners.pets', 'pet')
  .getMany()

// this will only get pet owners with pets (inner join)
await dataSource
  .getRepository(PetOwner)
  .createQueryBuilder('petOwners')
  .innerJoinAndSelect('petOwners.pets', 'pet')
  .getMany()

Complex selection

Suppose we needed to get the furthest scheduled appointment for mikes pets

If we hand coded this in sql it would be something like this.

SELECT pprofiles.name, "startAt" from "walkingapp"."walking_appointment"
inner join "walkingapp".pet as pets on pets.id = walking_appointment."petId"
inner join "walkingapp".pet_profile as pprofiles on pprofiles.id = pets."profileId"
inner join "walkingapp".pet_owner as powner on powner.id = pets."ownerId"
where powner.name = 'mike loves doggos'
and "startAt" = (select max("startAt") from "walkingapp"."walking_appointment"
inner join "walkingapp".pet as pets on pets.id = walking_appointment."petId"
inner join "walkingapp".pet_profile as pprofiles on pprofiles.id = pets."profileId"
inner join "walkingapp".pet_owner as powner on powner.id = pets."ownerId"
where powner.name = 'mike loves doggos')

Theres a sub query in this main query. The typeorm should look somewhat similar but we create the queries as separate parts.

const maxNextDateQuery = walkAppointmentRepo
  .createQueryBuilder('mxapts')
  .select(`max(mxapts."startAt")`)
  .innerJoin(Pet, 'mxpets', `mxapts."petId" = mxpets.id`)
  .innerJoin(PetProfile, 'mxpprofile', `mxpprofile.id = mxpets."profileId"`)
  .innerJoin(PetOwner, 'mxpowner', `mxpowner.id = mxpets."ownerId"`)
  .where('mxpowner.name = :ownerName', { ownerName: mikeName })

const appointmentQueryResult = await walkAppointmentRepo
  .createQueryBuilder('apts')
  .select(`"pprofile"."name", "apts"."startAt"`)
  .innerJoin(Pet, 'pets', `apts."petId" = pets.id`)
  .innerJoin(PetProfile, 'pprofile', `pprofile.id = pets."profileId"`)
  .innerJoin(PetOwner, 'powner', `powner.id = pets."ownerId"`)
  .where('powner.name = :ownerName', { ownerName: mikeName })
  .andWhere(`apts.startAt = (${maxNextDateQuery.getSql()})`)
  .getRawMany()

And we can see that typeorm produces

SELECT "pprofile"."name", "apts"."startAt" FROM "walkingapp"."walking_appointment" "apts"
INNER JOIN "walkingapp"."pet" "pets" ON apts."petId" = "pets"."id"
INNER JOIN "walkingapp"."pet_profile" "pprofile" ON "pprofile"."id" = pets."profileId"
INNER JOIN "walkingapp"."pet_owner" "powner" ON "powner"."id" = pets."ownerId"
WHERE "powner"."name" = $1
AND "apts"."startAt" = (SELECT max(mxapts."startAt") FROM "walkingapp"."walking_appointment" "mxapts"
   INNER JOIN "walkingapp"."pet" "mxpets" ON mxapts."petId" = "mxpets"."id"
   INNER JOIN "walkingapp"."pet_profile" "mxpprofile" ON "mxpprofile"."id" = mxpets."profileId"
   INNER JOIN "walkingapp"."pet_owner" "mxpowner" ON "mxpowner"."id" = mxpets."ownerId"
   WHERE "mxpowner"."name" = $1)

You’ll see that once you start to query specific data from specific relations, especially brining in aggregates or sub queries then using type orm doesn’t hide SQL from you very much. You still need to know what the underlying data store is doing.

The response for these kinds if ad-hoc relational queries are not typed with entities, they couldn’t be. These kinds of relations and ad-hoc queries are often where ORM abstractions and automated tooling like typeorm starts to break down and you go back to having to understand which join you need for specifc tasks.

Tip: Setting relationships with relation id

If you have created the join id column on your entities, you can use this to set relations without using full entities. The following example will set the pet owner without having to go get or create a full pet owner entity.

const pet = new Pet()
pet.owner_id = 23
const savedPet = await petRepository.save(pet)

Common object-relational mapping patterns

We have seen that cross-tables is a common way to map many-to-many object relations in a relational database.

There are two other paradigms that are very common in RDBMS databases.

The RDBMS might not be the best choice for these, but it is often the first database that all application data is stored in when working in a small team so there are well-known solutions.

Time series

You can store time series in an RDBMS. This is useful for any system that tracks state over time like tickers or IOT sensor state (temperature).

Instead of a record that updates when changing, you add a new row with a timestamp that describes that state at that time.

I mention time series here in the relations section because the series will often relate to an entity. You have to be careful reading and writing from a table like this, if using indexes you might cause undesirable locking.

This is an example of small time series. Instead of 2 rows - one for each entity, we keep a history over time for the entity state.

timestamp	transaction_id	state
1	1	CREATED
2	1	RUNNING
3	2	CREATED
4	1	ERROR
5	1	STOPPED
6	2	RUNNING

You want to avoid locking the table but you also very likely have to have transactional integrity to avoid bad states. Code that reads and particularly that writes to time series should have fault tolerance for concurrency issues (if you don’t have a last in wins type strategy).

If you continue working with large time series data you should consider using timescaleDB instead of regular postgres. Timescale is an enhanced postgres database that adds specific functionality for working with time series such as continues aggregation, or selecting first and last values for a time period.

Trees

Trees are very common in computing in general. In an RDBMS you can use “self-referencing” relations and this is a very common solution to trees.

Trees are so common that typeorm provides helper methods and support for tree structures. It supports multiple models for trees (https://typeorm.io/tree-entities) but i’ve always used adjacency lists.

To use you create an entity that self references like this

@Entity()
export class Category {
  @PrimaryGeneratedColumn()
  id: number

  @Column()
  name: string

  @Column()
  description: string

  @ManyToOne((type) => Category, (category) => category.children)
  parent: Category

  @OneToMany((type) => Category, (category) => category.parent)
  children: Category[]
}

Then instead of getting a normal repository you must get a tree repository.

const trees = await dataSource.manager.getTreeRepository(Category)

The tree repository exposes specific methods

findTrees()
findRoots()
findDescendants()
findDescendantsTree()
findAncestors()
findAncestorsTree()

You can see that typeorm has nice built in support for tree-like structures.

Summary

RDBMS are built to support and work with relational data. Typeorm gives us decorators to easily define our relations.

There are some special instances of relations that type orm supports, you should use the typeorm functionality instead of reinventing it.

DEV Community