Barbara

Posted on Nov 29, 2021 • Edited on Dec 1, 2021

Relational data models

#sql #dataengineering #learn #sketchnote

What

The relational data model was invented in 1969 at the IBM research department, with the lead researcher Edgar R. Codd. They created the 12 rules (13) to manifest the requirements for a data management system.

Rule 1: The information rule:
All information in a relational data base is represented explicitly at the logical level and in exactly one way – by values in tables.

Why

Relational data models are important as they help us with

standardisation of data models
flexibility in adding and altering tables
data integrity
structured query language can be used
simplicity - the tabular format
intuitive organisation - spreadsheets

When

small data volumes
secondary indices needed
Analytics needed
Aggregations needed
ACID transactions needed
Flexibility for writing queries: In relational data models we are modelling data and not queries.
easy to change, when business requirements change

Structure

Why

free database from unwanted insertions, updates and deletion dependencies
reduce the need of refactoring the database as new types of data are introduced
make the data more informative for users
neutral to quer statistics. We don't want to design for particular queries

Normalisation

Normalisation is needed to reduce data redundancy and to increase data integrity. The overall goal here is to have updates only in one place.

1NF - First Normal Form

Atomic values: each cell contains unique and single values
Be able to add data without altering tables
Separate different relations into different tables
Keep relations between tables together with foreign keys

2NF - Second Normal Form

1NF is reached
All columns in a table rely on the primary key

3NF - Third Normal Form

2NF is reached
No transitive dependencies

There are up to 6 normal forms. But most of the time we will need only up to 3.

Denormalisation

When everything is normalised, we need Joins to gather the data. The more Joins you need, the slower the query will be. Denormalisation must be done to increase performance.

Tables

Fact table

Consists of the measurements, metrics of facts of business process. EVENTS THAT HAVE ACTUALLY HAPPENED - like a transaction

Dimension table

A structure that categorises facts and measures in order to enable users to answer business questions.
Dimensions can be for example people, products, place or time.

Schemas

Star Schema

a physical model resembling a star shape
a fact table is at its center, surrounded by dimension tables
the simplified version of the snowflake scheme

Advantages

denormalised | simplifies queries | fast aggregations

Disadvantages

downsides of denormalisation | data integrity | decrease query flexibility | many to many relationship |

Snowflake Schema

logical arrangement of tables in a multidimensional database represented by centralised fact tables with are connected to multiple dimension.

Advantages

In the 1NF and 2NF it is more normalised than the STAR Schema

Sketchnote

To have all the above information in one view, I made a sketchnote.

If you need a higher resolution please use this page

Useful links

PostgreSQL: Documentation
PostgreSQL: Tutorial

DEV Community

Relational data models

What

Why

When

Structure

Why

Normalisation

1NF - First Normal Form

2NF - Second Normal Form

3NF - Third Normal Form

Denormalisation

Tables

Fact table

Dimension table

Schemas

Star Schema

Advantages

Disadvantages

Snowflake Schema

Advantages

Sketchnote

Useful links

Top comments (0)

Read next

Data Modeling - Entities and Events

Best Practices for Alembic and SQLAlchemy

LuxDevHQ (Lux Academy and Data Science East Africa) Training Program in AI, Data Science, Analytics, and Data Engineering

Top 🐘👀 Postgres Monitoring Tools 🧰 and Best Practices in 2024 🔝