DEV Community

Mistry Khateyi
Mistry Khateyi

Posted on

Understands Relationships, Schemas and Joins in Data Analytics for Beginners.

Introduction.

One of the most difficult aspects for a data analyst looking to become successful is learning how data from various tables relates. SQL queries, SQL Dashboards in Power BI, or designing a database, three things are seen throughout:

*Relationships
*Schemas

  • Joins

The below concepts are the foundation of relational databases. Once you know them, you will find data easier to analyze.
In this article, we will discuss each concept, and provide some examples.

Flowchart illustrating how raw data is transformed into insights through relationships, database schema design, SQL joins, and interactive dashboards.

What is a database relationship?

A relationship is a linkage between two or more tables by the same column.
Imagine an online store with two tables:

Customers

CustomerID Name
101 Alice
102 Brian
103 Carol

Orders

OrderID CustomerID Product
5001 101 Laptop
5002 101 Mouse
5003 103 Keyboard

Notice that the two tables have a column named CustomerID.
The column in that column is the person who placed the order.
If there were no relationships, each table would be considered a separate table.

Types of relationships

One-to-one relationship

One row in one table matches one and only one row in another.

One to many relationship

This is the most common relationship. This relationship is commonly leveraged in power BI models.

Many to many relationship

There are several records in one table that are associated with several records in another table.

What is database schema

Is the logical organization of a database is called a schema.

It defines:
*Tables

  • Columns
  • Relationships
  • Constraints

Database designers build schemas before data is stored, similar to how an architect designs a building before it is built. For example the blueprint of a building.

Common Schema Types

Star Schema

The best model to use in Power BI. It consists of :

  • One Fact table
  • Multiple Dimension tables For example;
            Products
                |
Customers ---- Sales ---- Dates
                |
            Employees
Enter fullscreen mode Exit fullscreen mode

Advantages:

  • Fast performance
  • Easy to understand
  • Optimized for reporting
  • Recommended by Microsoft

Snowflake Schema

A snowflake schema forces dimension tables to be broken into other tables.
Example

Sales
   |
Products
   |
Category
   |
Department
Enter fullscreen mode Exit fullscreen mode

Advantages

  • Reduces redundancy
  • Better data consistency

Disadvantages

  • More joins
  • More complex queries

Galaxy Schema

Also known as a Fact Constellation. It has several fact tables with dimension tables.
Example

Sales ------ Customers
   |
Inventory ---- Products
Enter fullscreen mode Exit fullscreen mode

This is normally used in large organization.

What Is a SQL Join?

A join is a method used to create a result set from two or more tables based on the key column of one table that matches the key column of another.
Joins are essential for being able to query data from multiple tables.

INNER JOIN

Its Only the matching records will be returned.
Example

SELECT
    Customers.Name,
    Orders.Product
FROM Customers
INNER JOIN Orders
ON Customers.CustomerID = Orders.CustomerID;
Enter fullscreen mode Exit fullscreen mode

Result

Name Product
Alice Laptop
Alice Mouse
Carol Keyboard

Brian is not included as he has no orders.

LEFT JOIN

Selects all rows from left table.

If there are matching records in the right table, they are added.

SELECT
    Customers.Name,
    Orders.Product
FROM Customers
LEFT JOIN Orders
ON Customers.CustomerID = Orders.CustomerID;
Enter fullscreen mode Exit fullscreen mode

Result

Name Product
Alice Laptop
Alice Mouse
Brian NULL
Carol Keyboard

RIGHT JOIN

Returns all the rows from the right table.
If there is a mismatch on the left side then those matches are null.

FULL OUTER JOIN

Returns:

  • All matching records All records that do not match between the two tables. This can be helpful to detect missing data.

CROSS JOIN

Makes all possibilities.
If one table has 10 rows and another table has 20 rows then the result contain.
A table joins with itself. Useful for:

  • Employee-manager relationships
  • Organizational hierarchies
  • Family trees

Relationships vs Joins

Relationships and joins are different, yet appear as if they are.

Relationships Joins
Permanent join of tables Temporary operations
Defined in the data model Written in SQL
Automatically used by Power BI Used when you query the data
Enhance the navigation of the model Add data to the model during execution

Best Practices

Use a Star Schema as much as you can

1.create relationships using unique keys.

If possible, do not use many-to-many relationships.

  1. Use meaningful primary keys.

3.Minimize unnecessary joins.

  1. Maintain dimension tables with no duplicate data.

Real-World Example

Suppose you are developing a Sales Dashboard in Power BI.
You might have:

  • Sales (Fact table)
  • Customers
  • Products
  • Date
  • Regions

The relationships link the tables together and Power BI uses the relationships to sum up sales by customer, by product, by month, or by region. In SQL you would have to join the tables beforehand and then work out your results.

Conclusion

The ability to understand relationships, schemas and joins are some of the most important skills for any Data Analyst. Relationships are the way that tables are related to each other, schemas provide a structure for the entire database, and joins allow to retrieve related data from other tables.
Understanding these concepts will help you design more effective SQL queries, create efficient Power BI data models, and design scalable databases.

Top comments (1)

Collapse
 
code_with_mwai profile image
Mwai Victor Brian

This awesome well....ellaborated