Understanding Data Modeling in Power BI: Joins, Relationships, and Schemas Explained.

#powerbi #sqljoins #datamodeling #datascience

Data Modelling

In Power Bi, data modelling involves the process of arranging data so that it is simple for people to understand, interconnect, as well as use for analysis. It this sense, it entails determining the relationships between various data sources or tables in Power BI, such as connecting consumers to their orders or products to the stores where they were bought. It is essential to note that data is divided into smaller, useful tables that are logically connected rather than being kept in a single huge table, for instance, a table consisting of customers only, another one of products that the customers bought, and and another one to show the stores where the customers bought the products. This increases clarity and thus, when filtering, grouping, or summarizing data, a clear data model guarantees that reports provide correct findings and makes searching for information easier for users.
Data modeling helps transform unprocessed data into valuable insights that facilitate improved decision-making by organizing data and establishing distinct relationships. We can say that data modeling is a reflection of how a business functions in real-world situations. In a retail operation, for instance, clients make purchases, goods are sold, and sales occur. In order to help users answer crucial questions like which goods sell the most or which stores make the most sales, an effective data model clearly shows these relationships. As such, reports could become unreliable, unclear, or even inaccurate without the right modeling. One key takeaway from my Data Science and Analytics class was that in modelling, all tables should have primary keys, which then can be considered as foreign keys in other tables. Primary keys are unique identifiers.

SQL Joins

As mentioned earlier in this article, we can have data stored in many tables. The data from these tables may be joined using SQL joins so that similar information can be analyzed together. Research indicates that many businesses usually store data in structured maintained distinct tables, such as sales transactions in one database and customer information in another. Using Joins, businesses and users can easily combine this data by grouping it according to a shared field, such as a customer ID. In Power Bi, there are six main types of SQL joins as follows:

Inner Join
Left Join (Left Outer)
Right Join (Right Outer)
Full Outer Join
Left Anti Join
Right Anti Join

In this view, if you only want to examine customers who have made purchases, an INNER JOIN will only return the entries that are present in both databases or tables. This makes it easier to concentrate on important information and prevents your analysis from containing unwanted information.
In our example above, in order to showcase the customers who have not made any purchases, an LEFT JOIN is used and retains all entries from the first table, even if there is no matching data in the second table. A RIGHT JOIN performs the reverse
A FULL OUTER JOIN incorporates all records from both tables and aids in identifying dataset discrepancies or missing data. Another helpful way of locating missing data, such as transactions with invalid customers, is to use the LEFT ANTI and RIGHT ANTI joins. The following is a step guide of using the joins in Power Bi:
Open Transform Data
Choose a base table (your primary business entity)
Merge with another table based on a shared key
Select the join type depending on your goal
Expand only the columns you need

Power BI Relationships and the Difference Between Joins and Relationships

Relationships are used in Power BI for connecting tables so they can "interact" without being merged into a single table. This makes it possible to analyze data from several tables while maintaining data organization. For instance, still in our example earlier, a common column like Customer ID may be used to link a sales table to a customer table. The sales data immediately changes to represent only those customers when a user selects customers based on any criteria. Depending on how the data is organized, in Power BI, relationships can be one-to-many, many-to-many, or one-to-one. Joins and relationships differ primarily in how and when they are applied. Before analysis starts, joins are used in Power Query to physically merge data into a single table. In contrast, relationships are made in the model view retaining a separation of tables while enabling dynamic interaction. To put it simply, a relationship connects data, whereas a join combines data. Relationships enable quicker performance and more engaging reporting, which is why Power BI is best suited for them.

Fact vs Dimension Tables

As learned in class today, a fact table usually contains measurable data, often quantifiable quantities that may be examined, such as sales volume, quantity sold, or overall profit. We can say that it contains numerical data and these tables document business-related events and transactions, such as purchases and payments. Because fact tables keep extensive data throughout time, they are typically rather big. Additionally, they have keys that link them to other tables, enabling you to analyze the data in various ways. On the other hand, dimension tables include descriptive details that help fact tables' data make sense or descriptive details about the business. They contain information such as dates, locations, product categories, and customer names. The data in reports is filtered, grouped, and labeled using these tables. For instance, dimension tables let you view sales by product, location, or time period in addition to overall sales. Compared to fact tables, dimension tables are often more reliable and smaller. It is vital to note that fact and dimension tables work together to provide a structure that facilitates the investigation and understanding of data.

Schemas: Star, Snowflake, and Flat Table (DLAT)

In a data modeling, schemas explain how tables are arranged and connected. We are going to focus on three main schemas as follows: the star, snowflake, and flat schemas.
The most popular and preferred schema in Power BI is the star schema. It is shaped like a star and features a central fact table that is directly linked to several dimension tables. This style is perfect for company reporting since it is straightforward and simple to read and understand. For instance, users may swiftly examine sales from several angles by connecting a sales table to customer, product, and date tables.
The snowflake schema is a more complex alternative of the star schema, because it further subdivides dimension data into smaller, linked tables. For example, a product table, could be connected to a category table, which in turn links to a department table. In Power BI, research indicates that this minimizes data duplication, but it also makes the model more difficult for people to understand.
The flat table (DLAT) schema, which can be helpful for small datasets or faster analysis, combines everything into a single large table. However, repetitive data make flat tables inefficient as data volumes increase.

Role-Playing Dimensions and Common Modeling Issues

Role-playing dimensions is when the same table is utilized for several purposes. For example, the order, shipment, and delivery dates, can be loaded into a Date table. The additional connections can be utilized with formulae as necessary, but only one connection is active at a time. Common issues in data modelling may revolve around the axis of too many connections, relationship loops, duplicate information, and unclear keys. These may cause reports produce inaccurate findings. Such issues may be avoided and analysis is made easier and more accurate by keeping the model straightforward, employing clear keys, and carefully managing connections.