Dummy Guide to Schemas and Data Modelling In Power BI

#data #powerbi #programming #analytics

Here is a super simple explanation of schemas and data modelling in Power BI. Think of Power BI like building with interlocking blocks.

What is Data Modelling?

Data Modelling is deciding which block is going to interlock with another, and how they should interlock so that you can build something useful.
You take all your messy tables, organize them, and connect them properly so that Power BI can, in turn, answer questions like:

"How much did each client spend last year?"
"Which products sell best in a particular country?"

What is a Schema?

First off, we need to know what a schema consists of. It has the fact table, which is the biggest table with lots of rows: it can contain numbers you want to add up (measures) or IDs linking to other tables.

It also consists of Dimension tables, which are smaller tables with unique rows that also contain descriptive attributes you use to filter.

The schema is the final blueprint that shows:

The tables you have
The columns inside each table
How the tables are connected to each other

There are 3 very common schemas in PowerBI:

1. Star Schema

This is the easiest for beginners and highly recommended in Power BI. It looks like one big central table with many small tables around it.

The fact table has the numbers you want to add up: Amount, Quantity e.t.c.
All the other tables are called dimension tables; they connect to the fact table usually with one number column, e.g., Customer ID, Product ID, or Date.

2. Snowflake Schema

It looks like star schema, but some dimensions are split further. It still has the fact table, but the dimension tables are split into extra levels.

It's best for very large data bases where storage matters a lot. Only use when you really need it, as it's harder and slightly slower than star schema.

Something to know,

Relationships - How tables talk to each other

In Power BI, it is a line connecting two tables so filters flow between them.
Relationships almost always in good models → one-to-many because it offers fast calculations, correct filtering, and no wrong totals, there's also:

One Side- Has unique values, for example, customer ID in customer table, each ID appears only once.
Many Side- Has repeating values, many rows can have the same values, for example, customer ID in sales table, same customer buys many times.

Overall, stick with star schema, it's what most good Power BI models use.