DEV Community

Cover image for From Raw Data to Insights: Schemas and Data Modeling
Kinyanjui
Kinyanjui

Posted on

From Raw Data to Insights: Schemas and Data Modeling

Introduction

Data doesn't become useful the moment you collect it, rather, it becomes useful when it is well structured. The structure is what schemas and data modeling are all about. Whether you're working in Excel or PowerBi or a full data warehouse, getting to know, understanding schemas and data modeling is the foundational skill for any data analyst.
This article breaks down schemas and data modeling in an elaborate way well suited for beginers.

Now lets dive in.

A Schema

(The blueprint)
Refers to the way tables are structred and the interelation they have with eachother. in other words, schemas define the logical structure, organization and relationships of tables within a data model. Majorly, they narrow down on structuring data for optimal performance and reporting.

Data Modeling

(The Plan)
It essentially refers to the crucial process of arranging and creating an interelation between data tables to create a logical, efficient sematic model, which in turn helps in creating accurate analysis and powerful reports by defining relationships, creating DAX computations(Measures) and having organized data by using schemas within Power querry and Model view.

Lets now connect the two. Data modeling is the process of designing a blueprint for organizing data based on various business needs and requirements, while a schema is the ground and technical implimentation of that blueptint within DBMS. The interelation is a direct progression from concept to physical reality.
The data model guides the schema design - The schema is the actualization of the model.

Why Schemas Matter

  • Make data easier to understand

  • Reduce query performance

  • Reduce data duplication

  • Support scalable reporting and analysis

Types of Schemas in Power BI

1. Star Schema

Refers to a mature modeling approach widely adopted by the relational data warehouses. It needs user to classify their model tables as their dimension or fact. It is designed for high performance. In addition it consists one or more central Fact tables (With metrics/measures) sorrounded by de-normalised dimensions tables (Containing Descriptive attributes)
The strucure:

Fact Table(Center) >> Contains Numeric measures (Things you aggregate) and foreigns keys to dimensions.

Dimension Tables (Around it) >> Contains descriptive attributes and one row per entity.

Fact and Dimensions

Reasons why Star Schema is recomended

  • Faster performance

  • Simpler DAX

  • Fewer relationship

  • Easy to explain to stakeholders

Tips:
If it describes something>> Dimension
If it is a sum of a number>> Fact

2. Snowflake Schemas

This is a more complex, and widely normalised version of the Star schema. This refers to when dimensions are normalised into multiple tabls. This is further broken down into Sub-dimensions to reduce data redundancy. Its basically an extension of the star schema often utilised for complex, hierachical data, improving storage efficiency at the cost of Query performance due to more required joins.

Pros: Reduces duplication
Cons: Slower, More complex DAX, and harder to maintain.

NOTE: Flatten demensions where possible. Storage is cheap, Confusion is expensive.**

Comparing A Good data model Vs A Bad data model.

Good Model Bad Model
Multiple small dimension tables One big table with over 50 columns
Simple relationships Many to many relationships everywhere
No duplicates Duplicate in columns

Role of Power Query in data modeling.

  • Removes duplicates

  • Creates dimension tables

  • Geenarate surrogate keys

  • Clean and starndardize columns

Common Modeling mistakes and how to avoid them

Using data columns in fact table only Create a proper Data table
Many-to-many relationships Introduce bridge tables
Too many bidirectional filters Redesign tables
Calculated columns everwhere Prefer measures

Data modeling is the bridge between raw data and clean insights, once your model are solid, DAX becomes silple and not scary.
Schemas and Data modeling may feel abstract at first, but they are what seperates messy data from insightful and reliable analytics systsems.Mastering them will Improve your dashboards, reduce errors, and make analysis scalable.

Top comments (0)