Benta Okoth

Posted on Feb 2 • Edited on Feb 3

Microsoft Power BI

#analytics #analyst #datascience

Introduction

Power BI - Is a complete reporting solution that offers data preparation, data visualization, distribution and management through development tools and an online platform.

There are Three Major Components of Power BI

Power BI Desktop(desktop application application) - For creating semantic models and reports with visualization.
Power BI Service(online platform) - Dashboards from published reports and distribute content with aps
Power BI Mobile(cross-platform mobile app) - For on-the-go access to the PBI service content, designed for mobile.

The Flow of Power BI is :

Connect to data with Power BI Desktop
Transform data with Power Query Editor
Model data with Power BI Desktop
Create Visualization and reports with Power BI Desktop
Publish Reports to Power BI Service
Distribute and Manage reports in Power BI service.

Building Blocks of Power BI

Semantic models - consist of all the connected data, transformation, relationships and calculations.
Visualization- when you create a visualization , you add it to the canvas for a report page.

1. Get Data In Power BI

Steps :
Open Power BI Desktop
Connect to different data sources
Preview source data with Power Query

(i)Flat File- We can get data from different flat files and then as well be able to change the data source.

Open Power BI click on Get Data then under common data sources choose Excel Workbook or Text/csv.

(ii) Relational data- We can as well get data from relational data sources

Get data -> Click on from SQL Server(enter servers and database).
Data Connectivity mode > Import
Advanced Options to type your SQL Query
Change data source settings under Transform data Power BI uses a view, when it retrieves data, it participates in query folding , a feature of Power Query.

(iii) NoSQL database - You can get data from NoSQL database like Azure, Azure Cosmos db, data stored in JSON format.

(iv) Online Service - We can get data from online services eg SharePoint, OneDrive, Dynamic 365, Google Analytics then enter the URL.

(v) Azure Analysis Service - A fully managed PaaS. Its a similar approach as getting data from sql server.

authenticate to the server
pick the model you want to use
select

Power BI Storage Model

We have three modes of storage:
Import
Direct Query
Dual(composite)

2. Transform data with Power Query Editor

Power Query helps with data cleaning and profiling.
The value or table in a column represents that the table has relationship with other tables and can be used to join tables.
At Power Query Editor helps profile data in PBI.

Profiling data is about studying the naunces of the data : determining anomalies, examining and developing the underlying data structures and querying data statistics eg row counts, value distribution, min and max values, average.
under View -> Data Preview
-> Check Column Quality
-> Column Profile
-> Column Distribution
-> Value Distribution
-> Column Statistics

When the Distinct = Unique it means the column contains unique values and this helps create 1-1 relationship.
Power Query Editor helps us with data cleaning, transformation and then load, that is, (i) Unpivot column, pivot columns and transpose columns.

(ii) Resolve inconsistencies, unexpected or null values, data quality issues.

(iii) Apply user-friendly value replacement.

(iv) Profile data so you can learn more about a specific column before using it.

(v) Evaluate and transform column data types.

(vi) Apply data shape transformations to table structures.

(vii) Combine queries.

(viii) Apply user-friendly naming convention to columns and queries.

(ix) Edit M code in the Advanced Editor.

The Various Power BI Functions and their Categorization

DAX functions (Modeling & calculations)

Used for measures, calculated columns, and calculated tables.
Common categories

Aggregation SUM(), AVERAGE(), COUNT(), DISTINCTCOUNT()
Filter / context CALCULATE(), FILTER(), ALL(), ALLEXCEPT()
Time intelligence DATEADD(), TOTALYTD(), SAMEPERIODLASTYEAR()
Logical IF(), SWITCH(), AND(), OR()
Lookup / relationships RELATED(), LOOKUPVALUE(), USERELATIONSHIP()
Table functions VALUES(), SUMMARIZE(), ADDCOLUMNS()

Used when: you’re creating KPIs, ratios, YoY growth, rolling averages, etc

Power Query (M) functions (Data transformation) Used in Transform Data before the model loads.

Common categories

Text Text.Upper, Text.Lower, Text.Trim, Text.Contains
Date/Time Date.Year, Date.Month, Date.AddDays
Number Number.Round, Number.Abs
Table Table.SelectRows, Table.AddColumn, Table.Merge
List List.Sum, List.Distinct
Conditional logic if … then … else

Used when: cleaning data, shaping columns, merging tables, fixing formats.

Power BI visual-level “functions” Not written as code, but still important.

Visual aggregations (Sum, Avg, Min, Max)
Filters (Top N, Relative Date)
Drill-down / drill-through
Tooltips
Conditional formatting
Visual calculations (newer feature)

Used when: shaping how data appears on the report.

Different Schemas In Power BI

A schema refers to the structure and organization of data within a data model.
Schemas define how data is connected and related within the model, influencing the efficiency and performance of data queries and reports.
Understanding schemas helps in designing best data models that support comprehensive analysis

Types of Schemas in Power BI

1. STAR SCHEMA

Definition: The star schema is a simple and commonly used schema in data warehousing. It consists of a central fact table surrounded by dimension tables, forming a star-like pattern.

Structure: The central fact table contains quantitative data (e.g., sales), while the dimension tables hold descriptive attributes related to the facts (e.g. Employee, Date, Territory).

Use Cases: Star schemas are ideal for straightforward reporting and querying. They are efficient for read-heavy operations, making them suitable for dashboards and summary reports.

The star schema does not necessarily have to be in the shape of a star. Below is also a star schema and is a preferred arrangement for many experts as in this arrangement is easier to visualize filter propagation from dimension to fact tables.

2. SNOWFLAKE SCHEMA

Definition: The snowflake schema is a normalized version of the star schema. In this design, dimension tables are further divided into related tables, resulting in a more complex structure.

Structure: The normalization process eliminates redundancy by splitting dimension tables into multiple related tables. This results in a web-like structure, resembling a snowflake.

Use Cases: Snowflake schemas are used in scenarios requiring detailed data models and efficient storage. They are beneficial when dealing with large datasets where data redundancy needs to be minimized.

3. GALAXIES SCHEMA(OR FACT CONSTELLATION SCHEMA)

Definition: The galaxies schema, also known as the fact constellation schema, involves multiple fact tables that share dimension tables, creating a complex, interconnected data model.

Structure: This schema consists of multiple fact tables linked to shared dimension tables, enabling the analysis of different business processes within a single model.

Use Cases: Galaxies schemas are suitable for large-scale enterprise environments where multiple related business processes need to be analyzed. They support complex queries and detailed reporting across various domains.

How Schemas Impact Power BI Performance

a. Data Storage Efficiency

b. Query Performance

c. Data Refresh Performance

Implementing Schemas in Power BI

a. Creating a Star Schema
Set Up Fact and Dimension Tables: Identify and create the central fact table and surrounding dimension tables.
Link Tables: Establish relationships between the fact table and dimension tables using foreign keys.
Optimize for Performance: Index key columns and use efficient data types to enhance query performance.

b. Implementing a Snowflake Schema
Normalize Dimension Tables: Split dimension tables into related sub-tables to reduce redundancy.
Create Relationships: Define relationships between sub-tables and the main dimension tables, ensuring referential integrity.
Optimize Storage: Use appropriate storage and indexing strategies to manage complex joins efficiently.

c. Setting Up a Galaxies Schema
Identify Fact Tables: Determine the various fact tables needed for different business processes.
Share Dimension Tables: Create shared dimension tables to link multiple fact tables.
Ensure Efficient Querying: Design the schema to support complex queries and optimize performance through indexing and data partitioning.

DEV Community