DEV Community

WaithakaJoseph
WaithakaJoseph

Posted on

Article on Modelling, Joins, Relationships and Different Schemas In Power BI

Data Modeling, Relationships, and Schemas in Data Analytics

In the fields of data analytics, data warehousing, and database management, modeling and schema design are the fundamental pillars used to organize and query information efficiently.

This article provides a comprehensive guide to these core concepts.

1. Data Modeling

Data modeling is the architectural process of designing how data is stored, interconnected, and accessed within a system.

Core Questions Addressed:

  • Storage: What specific data points need to be captured?
  • Structure: How should individual tables be organized?
  • Connectivity: How do these tables interact with one another?

Levels of Data Models:

  1. Conceptual Model: A high-level business perspective focusing on entities and their relationships, devoid of technical specifications.
  2. Logical Model: Defines specific attributes, keys, and relationships. It is independent of the Database Management System (DBMS).
  3. Physical Model: The actual implementation within a database, including technical details like indexes, partitions, and storage requirements.

2. Relationships

Relationships define the logic of how data in one table corresponds to data in another.

  • One-to-One (1:1): A single record in Table A relates to exactly one record in Table B.
  • One-to-Many (1:M): The most common relationship; for example, one Customer can place many Orders.
  • Many-to-Many (M:M): Multiple records in one table relate to multiple records in another. This requires a Junction Table (Bridge Table) to function. Example: One Student can enroll in many Courses, and one Course contains many Students.

3. SQL Joins

Joins are used to combine rows from two or more tables based on a related column.

Join Type Description
Inner Join Returns only the records that have matching values in both tables.
Left Join Returns all records from the left table and the matched records from the right.
Right Join Returns all records from the right table and the matched records from the left.
Full Outer Join Returns all records when there is a match in either the left or right table.
Cross Join Returns the Cartesian product (every possible combination) of the two tables.
Self Join A regular join in which a table is joined with itself.

4. Database Schemas

A schema is the logical configuration that defines how tables are organized and managed.

A. Star Schema

The standard for data warehousing and Power BI. It features a central Fact Table (quantitative data) surrounded by Dimension Tables (descriptive data).

Visual Representation:

          Dim_Customer
                |
Dim_Product -- Fact_Sales -- Dim_Date
                |
          Dim_Store
Enter fullscreen mode Exit fullscreen mode
  • Pros: Simple to understand, high query performance, optimized for reporting.
  • Cons: Includes some data redundancy (denormalized).

B. Snowflake Schema

An extension of the star schema where dimension tables are normalized (split into additional related tables).

Visual Representation:

          Country_Table
               |
          Dim_Customer
               |
Dim_Product -- Fact_Sales -- Dim_Date
Enter fullscreen mode Exit fullscreen mode
  • Pros: Reduced data redundancy and improved data integrity.
  • Cons: Requires more complex queries and more joins, which can impact performance.

C. Galaxy Schema (Fact Constellation)

A complex design where multiple fact tables share the same dimension tables.

  • Structure: Multiple Fact tables (e.g., Sales and Returns) linked to shared dimensions (e.g., Date and Product).
  • Pros: Supports complex business processes and enterprise-level analytics.
  • Cons: Highly difficult to design and maintain.

5. Integrity Keys

Keys are the unique identifiers that maintain the relationship between tables.

  • Primary Key (PK): A unique identifier for a specific row within its own table. It cannot contain null values.
  • Foreign Key (FK): A column in one table that points to the Primary Key of another table, creating a link between the two.

6. Normalization vs. Denormalization

  • Normalization: The process of organizing data to minimize redundancy (used in OLTP - transactional systems like banking apps).
  • Denormalization: The process of combining tables to speed up data retrieval (used in OLAP - analytical systems like Power BI or Tableau dashboards).

In Conclusion

  1. Identify Entities: What are the "things" you are tracking? (People, Places, Transactions).
  2. Define Keys: Every table needs a Primary Key (unique ID). Connect tables using Foreign Keys.
  3. Choose your Schema: Use Star Schema for most reporting needs. Use Normalization if you are building an app that handles many live transactions.
  4. Optimize Joins: Always join on indexed columns (like IDs) to ensure your queries don't slow down as your data grows.

Top comments (0)