Kahuthu Muriuki

Posted on Apr 5

Understanding Data Modeling in Power BI: Joins, Relationships, and Schemas Explained.

#datamodelling #analytics #dataviz

Introduction

Working across fintech environments — raw data rarely comes in a shape that is ready for analysis. You are almost always dealing with multiple tables, spread across different systems, each holding a piece of the full picture. Before you can build a single meaningful report in Power BI, you need to understand how those pieces connect.

That is what data modelling is about. It is the process of organising data into structured tables and defining how those tables relate to each other, so that reports are accurate, filters work correctly, and performance does not collapse under the weight of a large dataset. In Power BI, this modelling work sits between loading your data and building your visuals — and if you skip or rush it, everything downstream gets harder.

This article walks through the key building blocks: joins, relationships, cardinality, schemas, and a few practical mistakes worth avoiding.

1. Joins: Combining Tables at the Query Level

Before we get into how Power BI manages relationships, it helps to understand joins — because joins are how most data professionals first learn to connect tables, especially when working in SQL or within Power Query.

A join merges data from two tables into a single result set based on a shared column. Think of a payments table and a member accounts table in a SACCO system. Both tables share a member ID. A join uses that shared column to bring relevant records together.

But not every join behaves the same way, and choosing the wrong one can silently distort your analysis.

Inner Join

An inner join returns only the records where a match exists in both tables. If a member account has no corresponding payment record, it is excluded entirely. This is useful when you only care about records with complete data on both sides.

Left Join (Left Outer Join)

A left join keeps every record from the left table and pulls in matching records from the right table. Where there is no match, the right-side columns are left blank. This is particularly useful in compliance contexts — for example, when you need a full list of registered customers and want to flag those without a corresponding KYC document on file.

Right Join (Right Outer Join)

The mirror of a left join. All records from the right table are retained, and matching records from the left are attached. Non-matching left-side records are dropped.

Full Outer Join

Keeps everything from both tables, matched or not. Gaps appear where there is no corresponding record on either side. This is helpful when you want a complete picture — for instance, a reconciliation view showing all expected transactions alongside all actual ones, with clear gaps where discrepancies exist.

Left Anti Join

Returns records from the left table that have no match in the right table. In a brokerage context, this could surface client accounts that have never placed a trade — useful for identifying inactive accounts during a portfolio review.

Right Anti Join

Returns records from the right table with no match on the left. This could identify transactions that reference account IDs not found in your master client register — a red flag in any compliance or AML screening process.

Where to Apply Joins in Power BI

Joins in Power BI live in Power Query, accessed via Home → Transform Data → Power Query Editor. From there, you select the two tables, choose the matching columns, and pick your join type. The result is a merged table that can be loaded into your model.

The important thing to note is that joins are a data preparation step. They physically combine records into a single table. That is different from relationships, which we will cover next.

2. Relationships: The Backbone of Your Data Model

Once data is loaded into Power BI, the preferred way to connect tables is through relationships — not by repeatedly merging them. A relationship is a defined link between two tables based on a shared column. Define it once, and Power BI uses it automatically across every visual you build.

Consider a SACCO reporting scenario: you have a transactions table recording every deposit, withdrawal, and loan repayment, and a members table holding member details. A relationship on the member ID column means Power BI knows how to connect a member's name to their transaction history, without you writing a join every time you build a chart.

This is the fundamental difference between joins and relationships:

Joins are temporary and create a physical merged table. They increase data volume.
Relationships are persistent and logical. They keep tables separate but connected, and Power BI resolves the link at query time.

For large datasets — say, transaction logs from a payments platform — keeping tables separated through relationships rather than merged through joins is far more efficient.

Creating Relationships in Power BI

Power BI will attempt to auto-detect relationships when column names match across tables. This is convenient but unreliable. I have seen it try to link two columns both named date, from entirely unrelated tables, producing nonsensical filter behaviour. Always review auto-detected relationships and confirm or correct them manually.

There are two ways to create relationships:

Model View — drag a column from one table and drop it onto the matching column in another. A line appears between the tables showing the link.
Manage Relationships — available on the Home ribbon. This dialogue lets you create, edit, and delete relationships in one place, and gives you explicit control over cardinality and filter direction.

3. Cardinality: Defining How Tables Relate

Cardinality describes the numerical nature of the relationship between two tables — specifically, how records on one side correspond to records on the other.

One-to-Many (Most Common)

One record in the first table corresponds to many records in the second. In a securities brokerage setup, one client maps to many trades. The clients table holds one row per client; the trades table holds one row per trade, with the client ID repeated across multiple rows. This is the relationship type you will use most often — between dimension tables and fact tables.

Many-to-One

The same relationship viewed from the other direction. Many trades belong to one client. Power BI treats one-to-many and many-to-one as equivalent, depending on which table you anchor from.

One-to-One

Each record in one table matches exactly one record in another. This is uncommon but does come up — for example, linking an employee record to their biometric access profile where there is a strict one-person-one-profile constraint.

Many-to-Many

Multiple records in one table can relate to multiple records in another. This occurs in scenarios like a many-to-many product bundling setup, where one product can appear in multiple bundles, and each bundle can contain multiple products. These relationships require careful handling. Power BI supports them, but they can produce unexpected filter behaviour if not understood properly. Where possible, introduce a bridge table to resolve the many-to-many into two one-to-many relationships.

4. Cross-Filter Direction

When a relationship exists, Power BI needs to know which direction filters should travel. This is the cross-filter direction setting.

Single direction is the default and recommended approach. Filters flow from dimension tables toward the fact table. Select a specific security in a product's dimension, and the trade fact table updates to show only transactions involving that security.

Both directions allow filters to propagate in either direction between tables. This can be necessary in some advanced reporting scenarios, but it introduces complexity. Filters can cascade in unexpected ways across the model, making results harder to predict and debug. Use bidirectional filtering sparingly and document why it is needed when you do.

Active vs. Inactive Relationships

Power BI allows multiple relationships between two tables, but only one can be active at a time. The active relationship is the one Power BI uses by default in visuals.

Inactive relationships are not ignored — they exist in the model and can be called on explicitly using DAX functions like USERELATIONSHIP(). A common use case is date tables, where you might have separate relationships for trade date, settlement date, and value date, but only one can be active in the default context.

5. Fact Tables and Dimension Tables

Most professional data models are built around a clear separation between fact tables and dimension tables.

Fact tables store events or transactions. They record what happened — a payment processed, a trade executed, a loan disbursed, a member's deposit posted. Each row in a fact table is a single event. Fact tables tend to be long (many rows) and relatively narrow — they hold numeric measures and foreign keys, not descriptive details.

Dimension tables provide context for those events. They answer the who, what, where, and when. A members table, a products table, a branches table, a calendar table — these are all dimension tables. They tend to be shorter (fewer rows) and wider, with descriptive attributes that give your visuals meaningful labels.

The relationship between dimension tables and the fact table is almost always one-to-many: one member, many transactions. This structure is the foundation of the star schema.

6. Data Modelling Schemas

A schema is the overall structure of how your tables are arranged and connected. Choosing the right schema affects model performance, report flexibility, and how easy the model is to maintain as data grows.

Flat Table

Before any modelling happens, data often arrives as a flat table — one large sheet with every column crammed in: customer name, branch, product, transaction amount, date, currency, all in one place. This is common with data exports from core banking systems or when scraping transaction logs.

Flat tables are readable at a glance, but they scale poorly. Every time a customer name or branch label appears in a transaction, it is repeated in full. If a branch is renamed, you update hundreds or thousands of rows. Storage grows faster than it should, and query performance suffers. Flat tables are a starting point, not a destination.

Star Schema

The star schema is the workhorse of Power BI data modelling. One central fact table sits at the middle, with dimension tables connected directly around it. Viewed in Model View, it literally looks like a star.

In a payments context, your central fact table might hold every payment transaction — amount, currency, timestamp, sender ID, receiver ID, method ID. Surrounding it are dimension tables: a members/accounts table, a currency reference table, a payment methods table, a calendar table. Each dimension connects directly to the fact table on a one-to-many basis.

The advantages are real:

Descriptive data is stored once in dimension tables instead of being repeated in every transaction row.
The model is easy to read. You can look at the diagram and immediately understand how everything connects.
Power BI performs well with this structure. Filters propagate predictably from dimensions to the fact table.

For most reporting work — dashboards, KPI monitoring, regulatory reporting — the star schema is the right choice.

Snowflake Schema

The snowflake schema extends the star by normalising the dimension tables. Instead of one flat members dimension, you might split it into a members table, a regions table, and a countries table — each linking to the next. This breaks down hierarchical data into its component layers.

Normalisation reduces redundancy further and ensures data consistency. If Kenya is spelt "Kenya" in the countries table and linked by ID everywhere else, you never have to worry about "kenya" or "KENYA" appearing elsewhere.

The tradeoff is complexity. More tables mean more relationships to manage, more potential for misconfiguration, and more joins for Power BI to resolve at query time. Snowflake schemas make sense for very large datasets with deep hierarchies, or when the source data is already structured that way from a warehousing system. For most operational dashboards and management reporting, the star schema performs better and is much easier to maintain.

7. Common Mistakes and How to Avoid Them

A few problems come up regularly when building data models, particularly if you are moving quickly or working with unfamiliar data.

Missing relationships are the most common. If your tables are not properly linked, visuals behave unexpectedly — a slicer that should filter your transactions does nothing, or totals appear correct but do not break down properly. Always confirm that every connection between a fact table and its dimensions is in place.

Wrong cardinality can produce duplicated or inflated numbers. Setting a one-to-many relationship as many-to-many when the data does not warrant it causes Power BI to count records multiple times. Before confirming a relationship, verify whether your key column truly has unique values on the "one" side.

Descriptive columns in the fact table bloat the model unnecessarily. A member's name, branch label, or product category has no business sitting in every transaction row. That information belongs in dimension tables. Keep fact tables lean: foreign keys and numeric measures only.

Overusing bidirectional filters creates ambiguity. In a model with several connected tables, bidirectional filters can cause cascading filter propagation that makes results unpredictable. Default to single-direction filters unless you have a specific, well-understood reason to enable both.

Unnecessary snowflaking adds complexity without proportionate benefit. If a simple star schema serves your reporting needs, building a multi-level snowflake structure because it feels more rigorous will only make the model harder to troubleshoot. Start simple and add complexity only when the data or the reporting requirements genuinely demand it.

8. Putting It Together: A Practical Walkthrough

Say you are building a member activity dashboard for a SACCO. You have three tables: transactions, members, and loan_products.

Step 1 — Load data: Import all three tables into Power BI via Power Query.

Step 2 — Clean data: In Power Query, remove duplicate records, standardise column names (use member_id consistently, not ID in one table and MemberID in another), and handle any null values in key columns.

Step 3 — Build relationships: In Model View, link transactions[member_id] to members[member_id] with a one-to-many relationship. Link transactions[product_id] to loan_products[product_id] the same way.

Step 4 — Apply star schema: transactions is your fact table. members and loan_products are your dimension tables.

Step 5 — Build visuals: Now you can create a bar chart showing total disbursements by loan product, a table breaking down transaction volumes by member tier, and a slicer filtering all visuals by branch — all from a single clean model.

Conclusion

Data modelling is where reporting either earns its reliability or loses it. The mechanics — joins, relationships, cardinality, schemas — are not just theoretical abstractions. They directly determine whether a slicer filters correctly, whether a total is accurate, and whether a model can handle a growing dataset without grinding to a halt.

The star schema, clear one-to-many relationships, and lean fact tables are the foundation of any model worth building on. Get the structure right before you start building visuals, and the visuals become straightforward. Get it wrong, and you end up debugging report behaviour instead of actually analysing data.

DEV Community