GeraldM

Posted on Feb 1

Understanding Schemas and Data modeling in Power BI

#analytics #beginners #data #microsoft

Introduction

Power BI is a business intelligence and data visualization tool developed by Microsoft that allows user to connect to multiple data sources, transform and model data, and create reports and dashboards. In this article we will go through the various schemas and data models in Power BI.

Let's begin with Schemas

What is a schema?

In Power BI, a schema refers to the structure and organization of data within a data model by defining how data is connected and related within a model. By understanding schemas, we are able to design the best data models that enable comprehensive analysis.

Types of Schemas in Power BI

1. Star Schema

A star schema consists of one central fact table surrounded by dimension tables forming a star like pattern. It is a simple and commonly used schema in data warehousing.

A fact table consists of measurable data like Sales, while a dimensions table holds descriptive attributes related to the facts like Customers and Date.

Star schemas are ideal for straightforward reporting and querying making them suitable for dashboards and summary reports.

2. Snowflake Schema

A Snowflake schema is an extension of a star schema where dimension tables are further divided into sub-dimension tables (multiple related tables) creating a structure that looks like a snowflake.
For Example: A products dimension table can split into categories and brand names.
This splitting of tables reduces data redundancy by introducing more tables and joins which saves on storage.

Snowflake schemas are ideal in scenarios where storage optimization is critical and a detailed data model is a requirement.

3. Galaxy Schema (Fact Constellation Schema)

A Galaxy schema is also known as a fact constellation schema, it involves multiple fact tables that share dimension tables forming a complex interconnected constellation looking like data model.

Galaxies schemas are ideal for large-scale enterprise environment where multi-process/interrelated metrics analytics is required. For example analysis of finance and operations.

How to implement Schemas in Power BI

a. Star Schema

Setup fact and Dimension tables: Identify and created a central fact table and surrounding dimension tables. For example: A sales table being a fact table surrounded by a product and Customer table as dimensions tables as these tables provide descriptive information about the sale.
Link tables: Establish a link between the fact table and the dimensions table using foreign keys. For example, the sales table is linked to the customer table using the customer ID foreign key.

b. Snowflake Schema

1.Normalize Dimension tables: Normalize the dimension tables by splitting them into related sub-tables. For example, splitting the product table into category and brand tables.

2.Create relationships: Define the relationships between sub-tables and the main tables while maintaining referential integrity. For example the customer table can be split into city and Region tables where the table are related to the customers table through the city ID field.

3.Optimize storage: Use appropriate storage and indexing strategies to manage complex joins efficiently.

c. Galaxy schema

1.Identify Fact tables: Determine the various fact tables needed for various business processes. For example a sales fact table and a shipping fact table.

2.Identify shared dimensions tables: determine shared dimension tables that link the fact tables. For example, the sales fact table and the shipping fact table can be linked by a product dimensions table.

Data Modeling

What is data Modeling?
A data model is the structure of the data that we create for businesses. While data modeling is the process of structuring and organizing data from various sources into a coherent semantic data mode by defining tables, establishing relationships between them, creating calculated columns and measures using data analysis expressions (DAX), setting hierarchies and optimizing for performance.

Types of Data Models

1. Conceptual data modeling

They offer a big picture of what the system will contain, how it will be organized and which business rules are involved. Conceptual data modeling answers the following questions: What data do we need? What does the business desire? What data do we have access to? Where can we find this data?

2. Logical data modeling

They provide a greater detail about the concepts and relationships in the domain under consideration. They include facts (Events eg purchase) and dimensions (Actors eg customers) and how their relationships.

3. Physical Data modeling.

They provide a schema for how data will be physically stored within a database. They offer a finalized design that can be implemented as a relational database. Physical data models answer the following questions:
What are the columns of your data? What are the data types of your data? How are you storing this data? How can you compress this data to make it smaller?

Data Modeling Process

The following are a sequence of tasks to be performed in an iterative manner, building a workflow of creating data models.
1. Identify entities: Data modeling process begins with the identification of the things, events or concepts (entities) that are represented in the data set that is to be modeled.
2. Identify key properties of each entity: Each entity can be differentiated from others because it has one or more unique properties, called attributes. For example, an entity called “customer” might possess such attributes as a first name, last name and telephone number, while an entity called “address” might include a street name and number, a city, state, country and zip code.
3. Identify relationships among entities: Specify the nature of relationship each entity has with the others. For example, the relationship between a customer and address table is that the customer lives at the address.
4. Map attributes to entities completely: This ensures that the model reflects how the business will use the data.
5. Assign the keys as needed and decide on a degree of normalization that balances the need to reduce redundancy with performance requirements.
Normalization is a technique of organizing data models in which numerical identifiers, called keys are assigned to groups of data to represent relationships between them without repeating the data.
6. Finalize and validate the data model: Data modeling is an iterative process that should be repeated and refined as business needs change.

Types of Data Modeling

1. Hierarchical data modeling
In this model, each record has a single root parent which maps to one or more child tables creating a tree like format.

2. Relational data modeling
In this model, data segments are explicitly joined through the use of tables, reducing database complexity.

3. Entity-relationship (ER) data modeling
Formal diagrams are used to represent the relationships between entities in a database.

4. Object-oriented data modeling
Objects are grouped in class hierarchies and have associated features.

5. Dimensional data modeling
Designed to optimize data retrieval speeds for analytic purposes in a data warehouse.

Why perform Data Modeling?
Data modeling makes it easier for developers, data architects, business analysts, and other stakeholders to view and understand relationships among the data in a database or data warehouse

In addition to that, data modeling can:

Reduce errors in software and database development
Improve application and database performance.
Ease data mapping throughout the organization.
Improve communication between developers and business intelligence teams.
Ease and speed the process of database design at the conceptual, logical and physical levels.

Conclusion

In this article, we have gone through schemas and data modeling. Understanding different schemas in Power BI is crucial for designing efficient data models. Each schema has unique advantages and by choosing the right schema, you can improve query performance, data storage efficiency and data refresh operations. By mastering these schemas, you can create robust and scalable data models, enabling your organization to make data-driven decisions effectively.

Data modeling is a critical foundation for effective data management and analytics as it provides a clear and structured way to organize data in alignment with business needs. By progressing through conceptual, logical and physical models, organizations can be able to move from high-level business requirements to detailed, implementable database designs.

DEV Community