samkaruri

Posted on Apr 5

Understanding Data Modeling in Power BI: Joins, Relationships and Schemas Explained

#powerbi #datamodeling #businessintelligence #dataanalytics

Data Modeling in Power BI.

A perfect data modeling is like a great relationship; clear, structured and created on the right connections.
Data modeling in Power BI is a fascinating process where data relationships are created from tables and subsequently transformed into meaningful insights and interactive report. This process entail setting relationships, structures and calculations among the data table so that power BI can analyze and visualize them correctly.

In this article, we shall dive into how Power BI creates data models by expounding on the topics below.

1. Joins.

In power BI, joins are achieved through relationships in the model view or by merging queries in Power queries. The relationship/model view is more flexible approach as it creates a "virtual" join between tables via Key.

Left Outer Joins.

This is the most commonly used join for data modeling. Preserves all rows from the left table. Matching rows from the right table are appended; unmatched left rows get null values in the right-table columns.

Example: Imagine Safaricom runs an SMS campaign asking customers to rate a new M-Pesa feature:
Customers who reply, their feedback is recorded
Customers who don’t reply, still exist in the customer list, but feedback = null

Right Outer Joins.

RIGHT JOIN returns all rows from the table on the right side of the JOIN. If there are no matching rows in the left table, NULL values will be returned.
It joins two tables, e.g., Sales Details and Client Details, on their common ClientID field and returns all rows from Client Details along with any matching rows from Sales Details, returning NULL values for unmatched fields. The query selects the Order rid, Client Name, and Order Date, ordering the results by descending Order Date.

Full Outer Joins.

Retains all rows from both tables. Matching records are merged, while non-matching rows from either side are kept with null values in the missing fields.

Example: Comparing employee records from two separate HR systems. All employees from both systems are included, with nulls indicating where records do not match.

Inner Joins.

Returns only rows with matching keys in both tables. Unmatched rows from either side are discarded.

Example: Sales analysis limited to currently active employees. An inner join on Employee ID drops sales records tied to terminated employees and excludes active employees with no sales.

Left Anti & Right Anti.

A Left Anti join returns only the rows from the left table that do not have a corresponding match in the right table; any matching rows are excluded.

Example: When coming up with a campaign target list, a Left Anti Join removes clients who have already been contacted, leaving only those who have not yet been reached.

Returns only the rows from the right table that have no corresponding match in the left table.

Example: Comparing registered passengers in a bus booking system with actual travelers recorded at a terminal. A Right Anti Join highlights passengers who boarded but are not captured in the booking system, pointing to unrecorded tickets or system gaps.

2. Relationships In Power BI.

Relationships in Power BI are the connections between tables in a data model. They allow you to combine data from multiple tables for accurate calculations, filtering, and visualizations—without merging everything into one giant "flat" table.
Think of relationships as the "glue" that tells Power BI how rows in one table relate to rows in another. This is essential for star schema or snowflake schema designs, where you have fact tables (containing measurable numbers like sales, orders) and dimension tables (containing descriptive attributes like products, dates, customers).

3. Schemas: Star, Snowflake, and Flat Table.

Data modeling in Power BI relies heavily on how data is structured. Choosing the right schema Star, Snowflake, or Flat Table directly impacts performance, ease of analysis, and overall user experience. Each approach offers different experience between simplicity, efficiency, and flexibility, making it important to understand when and how to use them effectively. Let look at each of them.

Star Schema.-Highly recommended.

The Star Schema is the most recommended approach in Power BI. It consists of a single central Fact table connected directly to multiple Dimension tables through simple one-to-many relationships. There are no intermediate lookup layers or complex relationship paths.

To support this structure, Dimension tables are intentionally denormalized during data preparation. For example, instead of maintaining separate tables for Product, Product Subcategory, and Product Category, all attributes are combined into one unified Product dimension.

Why it works effectively in Power BI:

Performance: Queries require only a single relationship step between dimensions and facts, which aligns well with the VertiPaq engine’s optimization.
Simplified DAX: Filters flow directly from dimension tables to the Fact table without requiring complex logic.
User-friendly design: Each business entity is represented by one table, making it easier for report creators to understand and use.

The main drawback is data redundancy, where repeated text values may appear across many rows. However, Vert iPAQ minimizes this impact through dictionary encoding, storing repeated values efficiently in memory.

Snowflake Schema

A Snowflake Schema normalizes Dimension tables into multiple related sub-tables, forming hierarchical structures. For instance, a Product table may link to a Product Subcategory table, which then connects to a Product Category table.

While this design improves storage efficiency and enforces data consistency, it introduces challenges in Power BI. Queries become slower due to multiple relationship steps, and DAX expressions grow more complex. Filter propagation across several tables can also lead to ambiguous relationships and unexpected results.

Because of these limitations, it is generally advised to use Power Query to flatten Snowflake structures into a Star Schema before loading data into the model.

In a star schema, dimensions are denormalized—flattened into a single table per entity—allowing for faster query performance and simpler relationships. Snowflake schemas, where dimensions are normalized into multiple related tables, can still be used in complex scenarios, but they increase DAX complexity and may hurt performance.

Flat Table.

The Flat Table approach combines all data into a single large table, eliminating the separation between Fact and Dimension tables.

In Power BI Import mode, this is considered a poor practice. Including large volumes of repeated descriptive data alongside transactional records increases memory usage, slows data refresh, and complicates DAX calculations. Even simple operations, like removing filters on a single attribute, become more difficult compared to a Star Schema.

There is, however, one valid use case: Direct Query mode. When Power BI sends queries to external systems like Snowflake, Google Big Query, or Databricks, a pre-joined Flat Table can reduce the need for runtime joins and improve query performance.

Conclusion

A perfect data model in Power BI is much like a great relationship; clear, well-structured, and built on the right connections. When tables are properly joined through relationships, organized into an optimal schema, and supported by thoughtful calculations, your data transforms from raw numbers into powerful, actionable insights.

Throughout this article, we explored the foundational elements of effective data modeling:

Joins (via Power Query merges or model relationships) determine how accurately and flexibly your tables connect. Whether using Left Outer joins to preserve all customers in a campaign analysis or Anti joins to identify gaps, choosing the right join ensures you retain the data you need without unnecessary duplication.
Relationships act as the invisible yet critical glue that enables seamless filter propagation and accurate DAX calculations across your model.
Schemas define the overall architecture of your data model. The Star Schema stands out as the gold standard for most Power BI projects because it delivers superior query performance, simplified DAX expressions, and an intuitive structure that aligns perfectly with Power BI’s VertiPaq engine. While the Snowflake Schema offers better normalization in highly complex hierarchical data, it often comes at the cost of slower performance and increased complexity. The Flat Table approach, though tempting for its simplicity, is generally discouraged in Import mode due to higher memory consumption and maintenance challenges — though it can have niche value in certain DirectQuery scenarios.

Ultimately, investing time in building a clean, star-schema-based data model pays dividends in speed, scalability, maintainability, and user experience. A well-designed model not only makes your reports faster and more reliable but also empowers business users to explore data confidently without needing constant support from analysts.
Remember: Great reports don’t start with beautiful visuals — they start with solid data modeling. By mastering joins, relationships, and schema design, you lay a strong foundation that turns raw data into meaningful stories and smarter business decisions.
Start applying these principles in your next Power BI project, and you’ll quickly see the difference a thoughtful data model can make. The clearer your model, the brighter your insights.

DEV Community