Quick Overview of Relationship Schemas and Joins

#sql #analytics #database #beginners

Introduction

BI analysts are tasked with answering business questions for their clients. Before answering questions concerning revenue, profits, or growth prospects, an analyst needs to understand where the data is stored and how to bring them together. And here is where relationship schemas and joins come in. Data is usually not stored in one place. It is spread across multiple sheets originating from different systems, such as CRM records, POS systems, marketing campaigns, or off-system sales.

Relationship databases are made of two core concepts: relationship schemas and joins. Relationship Schemas is the framework on what data a given systems store and how the different sections of the data connect. Meanwhile, Joins is the means by which an analyst can query the connections established by relationship schemas. Ultimately, the latter creates the full picture of the data and answers business questions.

What is a Relationship Schema?

A relationship schema is a description of how data is structured. It describes the structure of each table and how the different tables relate to one another. A schema is more of the skeleton or structure for where data is stored and how the different information relates or connects. In addition, it is a map that tells you that a certain table exists and what columns the different tables hold and how these tables connect.

We have different types of tables as expounded below:

Flat Table – simple table with everything included without relationships being highlighted.
Fact Table – is the heart of a data warehouse with little descriptive texts and majoring in measurable business events.
Dimension Tables – provides context around the facts and richer in descriptions.
Bridge Table – also known as a Junction Table. It sits between two tables and holds one row per pairing in many-to-many relationships

When we narrow down to the tables in the relational database, each row represents one specific instance of the entity whereby each entity is maintained a table. Moreover, each column represents an attribute of the entity (specific table). Basically, the importance of a relationship schema is to store each piece of the data once and reference it whenever its required. For these relationships to work in a data design, you need two keys: primary and secondary.

The primary key is a must for every table because without it you have no way to uniquely refer to a specific row. Contrarily, foreign keys act as a link or pointer for more details. It references the primary key of another table and acts as a relationship between two tables. Furthermore, we have other key types: composite, surrogate and natural keys.

Two of the most common schemas include the Star and Snowflake schema. The former is the most commonly employed schema by analysts. It contains a fact table sitting a center surrounded by dimension tables. The dimension tables are smaller than the fact table and usually just provide context to the events captured in the fact table. The fact table contains foreign keys that point to each dimension. Meanwhile, the Snowflake schema is sort of an upgraded version of the Star where the dimension tables give links to other tables that allow one to create or write more joins. Other schemas you might come across include Galaxy schema, Data Vault, and the One Big Table.

There are three types of relationships:

One-to-One – one record in one table corresponds to exactly one record in another table.
One-to-Many – one record in one table corresponds to many records in another table.
Many-to Many – many records in one table corresponds to many records in another table, thereby requiring a bridge table.

What are Joins?

Once you have understood what a relation schema is and the importance of having structure in data housing the next step is leaning how to query the same data. A Join is an SQL operation that combines rows from tables based on columns with relationships. It usually follows matching of a foreign keys to a primary key. As we had explained earlier, data is usually generated from different tables or worksheets from multiple systems in a business. In that case, Joins brings the different relevant pieces together when you want to draw information.

Joins include the Inner Join, Left Outer Join, and the Right Outer Join. The Inner Join returns only rows where there is a match in both tables while the Left Outer Join returns all rows from the left table. In addition, the latter also returns any matching rows from the right table keeping in mind that no matches from the right column are filled with NULL. Conversely, the Right Join will return all the matching rows from the right table and left table, and unmatched columns on the left become NULL. Other Joins include:
• Full Outer Join returns all the rows from the tables and where here no matches NULL is filled on both sides.
• Cross Join literally combines everything since it produces a cartesian product of two tables resulting in exponential growth.
• Self Join is a table joined to itself which is useful in recursive or hierarchical data in a single table

DEV Community

Quick Overview of Relationship Schemas and Joins

Introduction

What is a Relationship Schema?

What are Joins?

Top comments (0)