DEV Community: Dishon Gatambia (Dd)

Connecting Power BI to SQL Databases

Dishon Gatambia (Dd) — Tue, 24 Mar 2026 11:22:48 +0000

A practical guide to integrating Power BI Desktop with local PostgreSQL and cloud-hosted Aiven databases - including data modelling and why SQL still matters.

Introduction: Power BI and SQL Databases
Connecting to a Local PostgreSQL Database
Connecting to Aiven Cloud PostgreSQL
Loading Tables and Creating Relationships
Why SQL Skills Matter for Power BI Analysts

1. Introduction: Power BI and SQL Databases

Microsoft Power BI is one of the leading business intelligence platforms in use today. It enables organisations of all sizes to transform raw data into interactive dashboards, reports, and visualisations that help decision-makers act on evidence rather than intuition. From tracking monthly sales performance to monitoring operational KPIs in real time, Power BI sits at the centre of how modern businesses consume their data.

Power BI is available in several forms. Power BI Desktop is the Windows application used to build reports and data models. Power BI Service is the cloud-based platform where those reports are published and shared across an organisation. Together, they cover the full lifecycle of analytical work - from raw data connection to executive-level dashboarding.

Why connect Power BI to a database?

While Power BI can import data from Excel files, CSV exports, and web APIs, these sources have significant limits. They are static, often out of date, and difficult to maintain at scale. A well-structured relational database, by contrast, is the authoritative source of truth for most business data. It stores transactions, customer records, inventory levels, and operational events with precision, consistency, and referential integrity.

When Power BI connects directly to a database, analysts can query the freshest available data, apply complex filters at the database level, and avoid the overhead of manually exporting and re-importing flat files. The database handles storage and retrieval efficiently; Power BI handles visualisation and exploration. Each tool does what it does best.

The role of SQL databases in analytical workflows

SQL (Structured Query Language) databases, including PostgreSQL, Microsoft SQL Server, and MySQL, are the backbone of most enterprise data architectures. They organise data into tables with clearly defined schemas, enforce relationships between entities, and support powerful querying through the SQL language.

PostgreSQL, in particular, is an open-source relational database widely used in both development and production environments. It supports advanced data types, complex joins, window functions, and JSON storage, making it a versatile choice for analytical workloads. Whether self-hosted on a local machine or managed in the cloud through platforms like Aiven, PostgreSQL integrates cleanly with Power BI.

2. Connecting to a Local PostgreSQL Database

What this guide covers: This guide walks through connecting Power BI to both a local PostgreSQL instance and a cloud-hosted Aiven database. It also covers data modelling with four linked tables: customers, products, sales, and inventory and concludes with a discussion of why SQL fluency is valuable for BI analysts.

A local PostgreSQL database runs on the same machine as Power BI Desktop, or on a machine within your local network. This is the standard setup for development, testing, or environments where the data does not leave the building. The connection process requires no SSL configuration and is straightforward once PostgreSQL is running and a database exists.

Prerequisites

Before beginning, confirm the following are in place:

Power BI Desktop is installed (Windows only).
PostgreSQL is installed, and the target database is created.
The Npgsql PostgreSQL connector is installed. Power BI requires this driver to communicate with PostgreSQL. Download it from the official Npgsql releases page and install it before opening Power BI.
You know the database name, a valid username, and its password.

Step-by-step connection process

Step 1 - Open Power BI Desktop

Launch the application. On the start screen, click Get Data. If you are already inside a report, navigate to Home → Get Data in the ribbon.

Step 2 - Search for PostgreSQL Database

In the Get Data dialogue, type PostgreSQL in the search box. Select PostgreSQL Database from the results and click Connect.

Step 3 - Enter the server and database details

In the connection dialogue, fill in two fields:

Server - enter localhost for a local instance, or a hostname/IP address for a network server.
Database - enter the exact name of the PostgreSQL database you want to connect to (for example, assignment).

Leave the Data Connectivity mode as Import unless you specifically require DirectQuery.

Step 4 - Provide credentials

Power BI will prompt for a username and password. Select Database under the credential type dropdown, then enter your PostgreSQL username (often postgres) and the corresponding password. Click Connect.

Step 5 - Select and load tables

The Navigator pane will display all schemas and tables in the database. Check the tables you want to load - for example, customers, products, sales, and inventory. Click Load to import them directly, or Transform Data to open the Power Query Editor first.

Connection flow:

 [Power BI Desktop] - [Get Data / PostgreSQL] - [Server & Credentials] - [Navigator] - [Load]
      Step 1                  Step 2                    Steps 3–4           Step 5        Final

Figure 3 - Local PostgreSQL connection flow

Tip: If the connection fails with a driver error, install the Npgsql connector and restart Power BI Desktop before trying again.

3. Connecting to Aiven Cloud PostgreSQL

Aiven is a managed cloud database platform that hosts PostgreSQL (and other databases) on your choice of cloud provider - AWS, Google Cloud, or Azure. Connecting Power BI to an Aiven PostgreSQL instance follows the same general steps as a local connection, with two important differences: the connection details are specific to your Aiven service, and SSL must be used to encrypt the connection.

Obtaining connection details from Aiven

Log in to the Aiven Console and open your PostgreSQL service. On the service overview page, you will find all the information needed to establish a connection:

Parameter	Where to find it	Example value
Host	Service Overview - Connection Information	`pg-abc123.aivencloud.com`
Port	Next to the host, typically a custom port	`15432`
Database	Listed under the service name	`defaultdb`
Username	Connection Information section	`avnadmin`
Password	Click the eye icon or copy button	(hidden - copy directly)
SSL Certificate	Download button in Connection Information	`ca.pem`

Download the CA certificate (ca.pem) and save it to a location you can reference easily, such as C:\certs\aiven-ca.pem on Windows.

Why SSL certificates are required

A cloud database is accessible over the public internet. Without encryption, data transmitted between Power BI and the Aiven server - including credentials and query results - would be visible to anyone monitoring the network. SSL (Secure Sockets Layer) / TLS (Transport Layer Security) encrypts the entire connection, preventing interception.

The CA certificate serves a second purpose: it allows Power BI to verify that it is connecting to the genuine Aiven server and not an impostor. This is known as certificate verification, and it protects against man-in-the-middle attacks. Aiven requires SSL on all connections; it cannot be disabled.

Step-by-step: connecting via Power BI Desktop

Step 1 - Open Get Data - PostgreSQL Database

Follow the same steps as the local connection: Home - Get Data - PostgreSQL Database - Connect.

Step 2 - Enter the Aiven host and port

In the Server field, enter the full Aiven hostname followed by a colon and the port number:

pg-abc123.aivencloud.com:15432

In the Database field, enter the database name (often defaultdb unless you created a named database).

Step 3 - Expand Advanced Options and add the SSL certificate

In the connection dialogue, click Advanced Options to reveal additional fields. In the Additional connection string parameters box, enter the SSL certificate path in the following format:

sslmode=verify-ca;sslrootcert=C:\certs\aiven-ca.pem

This tells Power BI to use SSL, verify the server's certificate, and trust only certificates signed by the CA you downloaded.

Step 4 - Enter credentials and connect

Click OK. When prompted, select Database authentication, enter the Aiven username (typically avnadmin) and password copied from the Aiven Console. Click Connect.

Step 5 - Select tables in the Navigator

The Navigator pane will display the available tables. Select the required tables and click Load or Transform Data.

Certificate path must use the correct syntax. On Windows, use double backslashes or forward slashes in the path. If the ca.pem file is not found, Power BI will fail to connect with an SSL handshake error. Verify the file exists at the exact path specified.

4. Loading Tables and Creating Relationships

Once the connection is established, Power BI loads the selected tables into its internal data model. For this guide, the PostgreSQL database contains four tables organised around a retail business scenario:

Table	Primary key	Key columns	Description
`customers`	`customer_id`	first_name, last_name, email, registration_date, membership_status	50 customer records with registration and membership data
`products`	`product_id`	product_name, category, price, supplier, stock_quantity	15 products across categories with pricing and supplier info
`sales`	`sale_id`	customer_id (FK), product_id (FK), quantity_sold, sale_date, total_amount	15 transaction records from 2023–2024
`inventory`	`product_id` (FK)	stock_quantity	Current stock levels for each product

How Power BI auto-detects relationships

After loading these tables, Power BI may automatically detect relationships based on matching column names and data types. In this schema, it will likely identify that sales.customer_id references customers.customer_id, and that sales.product_id references products.product_id. The inventory table shares product_id with products.

To inspect and manage relationships, navigate to the Model view in Power BI Desktop, the icon that looks like three connected boxes in the left sidebar. Here you can see a visual map of all tables and the lines connecting them.

Creating and editing relationships manually

If Power BI does not detect relationships automatically, or if a detected relationship is incorrect, you can manage them manually. In the Model view, drag from a foreign key column in one table to the primary key in another. Power BI will draw the relationship line and ask you to confirm the cardinality and cross-filter direction.

For this schema, the three key relationships are:

customers.customer_id - sales.customer_id (one-to-many)
products.product_id - sales.product_id (one-to-many)
products.product_id - inventory.product_id (one-to-one)

What data modelling enables

With relationships in place, Power BI treats the tables as a unified data model rather than isolated datasets. When you build a chart showing total sales revenue by product category, Power BI knows to join sales to products via product_id to retrieve the category name. When filtering by customer membership status, it traverses the relationship from customers to sales automatically.

This is the core principle of star schema data modelling: a central fact table (sales) linked to dimension tables (customers, products) that describe the who, what, and when of each transaction. The inventory table functions as a supplementary dimension providing current stock context alongside product data.

Note on cross-filter direction: By default, Power BI uses single-directional filtering; filters flow from the dimension table into the fact table. In most cases, this is correct. Avoid enabling bidirectional filtering unless you have a specific requirement, as it can produce unexpected aggregation results and slow report performance.

5. Why SQL Skills Matter for Power BI Analysts

Power BI's graphical interface makes it possible to build dashboards without writing a single line of SQL. But analysts who understand SQL bring a fundamentally different level of capability to their work. SQL is not a requirement for using Power BI, it is a requirement for using it well.

"An analyst who can write SQL is not just faster, they understand the data at a structural level that shapes every design decision they make in Power BI."

Four ways SQL strengthens Power BI work

1. Precise data retrieval

Rather than loading an entire table and filtering inside Power BI, an analyst with SQL knowledge writes a query that retrieves only the rows and columns needed. This reduces memory usage, speeds up refresh times, and keeps the data model lean.

2. Filtering at the source

SQL WHERE clauses filter data before it reaches Power BI. An analyst who understands this can avoid importing years of historical records when only the past 12 months are relevant to the dashboard, a significant difference at scale.

3. Pre-aggregation and joins

Complex calculations, such as monthly revenue per customer segment, or average order value by product category, can be computed in SQL before the data is loaded. This offloads processing to the database engine, which handles large aggregations far more efficiently than Power BI's in-memory model.

4. Data preparation and quality

SQL lets analysts clean and reshape data at the source, standardising date formats, handling nulls, concatenating name fields, or pivoting rows into columns, before the data ever reaches Power Query. This keeps the Power BI model simple and the transformation logic auditable.

SQL in the context of this schema

With the assignment database used throughout this guide, a Power BI analyst who understands SQL can write queries like the one below to pre-aggregate sales data before loading it, rather than importing all 15 raw transaction rows and computing totals inside DAX:

select
    p.category,
    c.membership_status,
    SUM(s.total_amount) as revenue,
    COUNT(s.sale_id) as transactions
from sales s
inner join products p on s.product_id = p.product_id
inner join customers c on s.customer_id = c.customer_id
group by p.category, c.membership_status
order by revenue desc;

This query produces a compact summary table; joined, grouped, and sorted, ready for Power BI to visualise. An analyst who cannot write this query must load three raw tables, build the join in Power Query, and compute the aggregations with DAX. The result is the same, but the path is longer, more error-prone, and harder to debug.

Summary: SQL and Power BI as complementary layers

Layer	Tool	Responsibility
Storage & retrieval	PostgreSQL	Tables, indexes, relationships, data integrity
Transformation	SQL	Filtering, joining, aggregating, cleaning at source
Modeling	Power BI Desktop	Star schema, relationships, DAX measures
Visualisation	Power BI Desktop / Service	Dashboards, reports, interactive charts

SQL and Power BI are not competing tools; they are complementary layers in the same analytical pipeline. SQL handles structured retrieval and transformation at the database level. Power BI handles interactive visualisation and self-service exploration at the consumer level. Fluency in both means the analyst decides, with full awareness, where each operation belongs.

Data Engineering & BI · Technical Guide · PostgreSQL · Aiven · Power BI Desktop

SQL ANALYTICAL MECHANICS: JOINS & WINDOW FUNCTIONS

Dishon Gatambia (Dd) — Sun, 08 Mar 2026 15:17:34 +0000

Structured Query Language (SQL) is the fundamental protocol for relational database management and data retrieval. Sophisticated analysis relies on two primary mechanisms: Joins and Window functions. Joins allow you to combine data from multiple tables, while window functions enable advanced calculations over subsets of data without collapsing rows. This technical overview details implementation strategies and optimisation for practitioners with foundational SQL knowledge.

Feature	Primary Mechanism	Result Set Impact
Joins	Horizontal merging of distinct tables via shared keys.	Alters row count and structure based on matching logic.
Window Functions	Analytical computations across defined row subsets (windows).	Preserves original row count; appends calculated data.

Core Principles

Joins: Essential for reconstructing normalized data structures. Operations include INNER, LEFT, RIGHT, and FULL OUTER joins to define the scope of the intersection or union.
Window Functions: Utilised for ranking (RANK()), running totals (SUM() OVER), and time-series analysis (LAG/LEAD). The OVER clause defines the logic through PARTITION BY and ORDER BY.

What a Join actually does

A JOIN is the mechanism by which a relational database combines rows from two or more tables based on a related column. The engine performs a logical comparison between rows of a left and right dataset, then determines which row combinations satisfy the specified condition.
The key insight: a JOIN does not "add columns to a table." It builds a new, temporary result set. Every row in that result is a pairing of rows from the participating tables. How many rows get included, and from which side, depends on the join type. Without joins, you'd be limited to single-table queries, which rarely suffice in real-world scenarios like e-commerce databases or customer relationship management systems.

Types of Joins

Inner Joins

Returns only the rows where the join condition is satisfied in both tables. Rows that have no match on either side are excluded entirely.

select e.name, d.department_name
from employees e 
inner join 
departments d on e.department_id = d.department_id;

What's happening - For every row in the employees table, the query looks for matching rows in the departments table. If no match exists, that employee does not appear in the output. If three departments exist for one employee, that employee appears three times.

Left Join (Left Outer Join)

Returns all rows from the left table, and the matched rows from the right table. Where no match exists on the right, NULL fills the right-side columns.

select e.name, d.department_name
from departments d 
left join employees e on d.department_id = e.department_id;

When to use it: Any time "missing" data is meaningful. If d.department_name is NULL, you know that employee has no assigned department. This is the join type to use for finding gaps.

--Find employees with no assigned departments
select e.name, d.department_name
from departments d 
left join employees e on d.department_id = e.department_id
where d.department_name = NULL;

This returns employees with no assigned departments.

Right Join

The mirror of LEFT JOIN. All rows from the right table are preserved; NULLs appear on the left where no match exists. In practice, most engineers rewrite RIGHT JOINs as LEFT JOINs by reversing table order because it's cleaner.

--Finding employees and all departments (including Empty departments)
select e.name, d.department_name
from employees e 
right join departments d on e.department_id = d.department_id;

Full Outer Join

Returns all rows from both tables. Where no match exists on either side, NULLs fill the unmatched columns. This union of both outer joins. In short, it shows all rows from all tables.

--Shows all rows from both tables
select e.name, d.department_name
from employees e 
full outer join departments d on e.department_id = d.department_id;

When to use it: Data reconciliation, audits, finding orphaned records on either side.

Cross Join

Produces the Cartesian product. Every row in the left table is paired with every row in the right table. No ON condition.

--Every employee with every project 
select e.name, p.project_name 
from employees e 
cross join projects p;

A table with 10 rows crossed with a table of 5 rows produces 50 rows. Use deliberately. Running a CROSS JOIN on two large tables without a WHERE clause can produce billions of rows.

Self Join

A table joined to itself. The table is aliased twice to distinguish which "instance" is being referenced.

--Employee who is a manager 
select distinct m.name as manager 
from employees e 
join employees m on e.manager_id = m.employee_id;

Use cases for Joins

Reporting: Combine customers' details from a customers table with their orders from an orders table to generate sales reports.
Data Integration: Merge data from disparate sources, such as user profiles and activity logs.
Data Cleaning: Identify and handle missing relationships, like orphaned records.

Best practices include using aliases for table names to improve readability, indexing join columns for performance, and avoiding unnecessary joins to prevent query slowdowns.

Window Functions

Window functions, introduced in SQL in 2003 and supported by most modern databases like PostgreSQL, MySQL (from version 8.0), and SQL Server, perform calculations across a set of rows related to the current row. Unlike aggregate functions (e.g., SUM(), AVG()) that group rows and reduce output, window functions maintain the original row count while adding computed columns.
With GROUP BY + SUM(), you get one row per group. With a window function SUM() OVER (PARTITION BY ...), every original row survives in the output, but each row gains a new column containing the aggregate value computed over its "window".
A window function is defined using the OVER() clause, which specifies the "window" or partition of rows to operate on. It can include:

PARTITION BY: Divides the result set into partitions (groups) where the function is applied independently.
ORDER BY: Sorts rows within each partition.
Frame Clause: Defines a subset of the partition (e.g., ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW for running totals).

Partition By

Divides the result set into groups (partitions). The window function restarts its calculation for each partition. This is analogous to GROUP BY, except the rows are not collapsed.

-- Total sales per region, shown on every row
select
    region,
    salesperson,
    sales_amount,
    sum(sales_amount) over (partition by region) as region_total
from sales;

Output will have one row per salesperson, but region_total shows the sum for all salespeople in that region.

ORDER BY Inside OVER

When ORDER BY is specified inside OVER(), the window function becomes aware of row sequence within each partition. For ranking functions, this determines rank. For running totals, it defines the cumulative direction.

-- Running total of sales per region, ordered by date
select
    region,
    sale_date,
    sales_amount,
    sum(sales_amount) over (
        partition by region
        order by sale_date
    ) as running_total
FROM sales;

Types of Window functions

Ranking functions

Assigns ranks or numbers to rows

Row_Number()

Assigns a unique sequential integer to each row within a partition. Ties receive different numbers (non-deterministic unless the ORDER BY is fully deterministic).

-- Rank employees by salary within each department
select
    department,
    employee_name,
    salary,
    row_number() over (
        partition by department
        order by salary desc
    ) as row_num
from employees;

Use case: Deduplication - select WHERE row_num = 1 to keep only the highest-paid employee per department.

Rank()

Like ROW_NUMBER(), but ties receive the same rank, and the next rank skips. A tie at rank 2 means the next rank is 4 (not 3).

select
    department,
    employee_name,
    salary,
    rank() over (
        partition by department
        order by salary DESC
    ) as salary_rank
from employees;

Dense_Rank()

Like RANK(), but no gaps after ties. A tie at rank 2 means the next rank is 3.

select
    department,
    employee_name,
    salary,
    dense_rank() over (
        partition by department
        order by salary desc
    ) as dense_rank
from employees;

Salary	ROW_NUMBER	RANK	DENSE_RANK
100	1	1	1
100	2	1	1
90	3	3	2

NTILE(n)

Divides the partition into n equal buckets and assigns a bucket number to each row.

-- Divide customers into quartiles by lifetime value
select
    customer_id,
    lifetime_value,
    ntile(4) over (order by lifetime_value desc) as value_quartile
from customers;

LAG() and LEAD()

Access values from a previous or subsequent row within the partition, without a self-join.

LAG(): Returns a value from a previous row.
LEAD(): Returns a value from a next row.

-- Month-over-month revenue change
select
    month,
    revenue,
    lag(revenue, 1) over (order by month) as prev_month_revenue,
    revenue - lag(revenue, 1) over (order month) as change
from monthly_revenue;

-- Show what the next month's revenue will be
select
    month,
    revenue,
    lead(revenue, 1) over (order by month) as next_month_revenue
from monthly_revenue;

FIRST_VALUE() and LAST_VALUE()

Return the first or last value in an ordered window frame.

-- Show each sale alongside the first and most recent sale in that region
Select
    region,
    sale_date,
    sales_amount,
    first_value(sales_amount) over (
        partition by region
        order by sale_date
    ) AS first_sale,
    last_value(sales_amount) over (
        partition by region
        order by sale_date
        rows BETWEEN unbounded preceding and unbounded following
    ) as last_sale
from sales;

Important: LAST_VALUE() requires an explicit frame clause (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING). Without it, the default frame ends at the current row; so LAST_VALUE() would just return the current row's value.

Window Frames: ROWS vs RANGE

The frame clause defines exactly which rows within the partition are included in the calculation relative to the current row.

-- Rolling 3-row average (current row + 2 preceding)
select
    sale_date,
    sales_amount,
    avg(sales_amount) over (
        order by sale_date
        rows between 2 preceding and current row
    ) as rolling_3_avg
from sales;

-- Rolling 7-day average (based on value range, not row count)
select
    sale_date,
    sales_amount,
    avg(sales_amount) over (
        order by sale_date
        range between interval '6 days' preceding and current row
    ) as rolling_7day_avg
from sales;

ROWS vs RANGE:

ROWS: physical row count
RANGE: value-based range (treats tied ORDER BY values as the same position)

Concept in plain terms

JOIN is the act of stitching tables together. A relational database stores data in separate tables deliberately; orders in one table, customers in another, to avoid redundancy. JOINs are how you reassemble that normalised data into the shape a query needs. The join type controls what happens when no match exists: INNER drops the row, OUTER preserves it with NULLs.
Window functions are the answer to "I need an aggregate, but I don't want to lose the detail rows." The OVER() clause tells the database: compute this value, but do it within a sliding context tied to each row. The result is a hybrid row-level detail that coexists with group-level calculations in the same output. This makes complex analytics expressible in a single query that would otherwise require multiple subqueries, self-joins, or application-level processing.
The two features address different problems. JOINs control the shape and completeness of the dataset. Window functions compute derived metrics across that dataset without changing its grain. Used together, they handle a wide range of analytical query patterns efficiently and without procedural code.

How Analysts Translate Messy Data, DAX, and Dashboards into Action Using Power BI

Dishon Gatambia (Dd) — Sun, 15 Feb 2026 12:03:45 +0000

Introduction

Data analysts utilise Power BI to convert fragmented, unorganised data into strategic business intelligence by managing a complete workflow of preparation, calculation, and visual communication.

Data Ingestion and Transformation

Real-world data is typically disorganised, arriving with inconsistent formatting, missing values, and structural issues across various sources like ERP databases, spreadsheets, and APIs. Analysts use Power Query and the M language to build reproducible ETL (Extract, Transform, Load) pipelines. This process includes:

Structural Normalisation: Tasks such as unpivoting tables, merging queries, and handling null values prepare data without needing database administrator assistance
Schema Enforcement: Converting data types (e.g., text to numeric) and standardising date formats to ISO 8601 prevents errors in later calculations.
Relationship Modelling: Analysts organise data into star schemas, where fact tables containing quantitative data connect to descriptive dimension tables via defined cardinalities

DAX: The Analytical Layer

Data Analysis Expressions (DAX) allow analysts to move beyond basic table structures to perform dynamic calculations.

Calculated Columns vs. Measures: Calculated columns are pre-computed and stored in memory for row-level categorisation.

Customer Segment = 
SWITCH(
    TRUE(),
    Sales[Total Amount] > 10000, "Enterprise",
    Sales[Total Amount] > 1000, "Mid-Market",
    "SMB"
)

In contrast, measures are dynamic aggregations that compute only when filtered by visuals or slicers.

Total Revenue = 
SUMX(
    Sales,
    Sales[Quantity] * Sales[Unit Price]
)

Context Manipulation: The CALCULATE function is used to override or modify existing filters, enabling advanced comparisons like Year-over-Year (YoY) growth or high-value customer identification.

Revenue Previous Year = 
CALCULATE(
    [Total Revenue],
    DATEADD(Calendar[Date], -1, YEAR)
)

FILTER creates row contexts for granular conditional aggregation

High Value Customers Revenue = 
CALCULATE(
    [Total Revenue],
    FILTER(
        Customer,
        [Total Revenue] > 5000
    )
)

Time Intelligence: Specialised functions allow for calculations like Year-to-Date (YTD) revenue, provided a contiguous calendar table is present.

YTD Revenue = 
TOTALYTD(
    [Total Revenue],
    Calendar[Date]
)

Dashboard Construction and Interaction

Dashboards serve as the interface that reduces cognitive load for stakeholders. Effective design relies on choosing the correct visual for the data type:

Visual Selection: Bar charts are used for category rankings, line charts for temporal trends, and matrices for hierarchical drill-downs. Single KPI values are highlighted using card visuals with variance indicators.
Interactivity: Analysts configure how visuals interact—through cross-filtering or highlighting—and use bookmarks or parameters to enable "what-if" analysis

Translation to Action

The final stage of the analytical workflow is converting observations into organisational impact.

Pattern Recognition: Analysts identify critical deviations, such as revenue falling below forecasts or elevated churn rates in specific segments.
Drill-Down Capabilities: Tooltips and drill-through features allow users to investigate the raw transactions behind aggregate numbers.
Decision Support: By sharing these insights through the Power BI Service or embedded reports, analysts provide decision-makers with the evidence needed to optimise resources, reduce costs, or expand markets.

Conclusion

Power BI empowers analysts to transform messy data into actionable intelligence through a seamless workflow of cleaning, calculating, visualising, and sharing. By mastering Power Query, DAX, and dashboard design, analysts bridge the gap between data and decision-making, driving organisational success

HOSPITAL & PHARMACY DATA ANALYSIS

Dishon Gatambia (Dd) — Sun, 15 Feb 2026 10:58:29 +0000

Data Cleaning

Set all the columns with the correct data types eg: ID columns set to text, dates set to date, numbers set to whole number, texts set to text. Unit price and cost set to fixed decimal number.
Trimmed and cleaned all the columns
There were no missing values and errors to correct
Removed duplicates if any for the patient ID, Visit ID & Transaction ID in the Patients, Visits & Pharmacy Transactions respectively
Capitalised each word in cells
Parsed through the data to check for negative values like age, quantity & cost. None found
Added a new column, Calculated total cost=(Unit price) * (Quantity). Parsed through and saw that there were no differences with the given total cost, so deleted the new column

Data Modelling

For this dataset, my fact tables are: Visits and Pharmacy transactions as they contain quantitative data while my dimension table is patients as it contains descriptive data.
The unique identifiers in patients table is patientID for visits is visitID for pharmacy transactions is transactionID
To create the relationships link patientID in patients and visits tables. This relationship is many-to-one from visits _to _patients table or vice versa. Reveals that one patient can have many visits in the hospital. For pharmacy transactions to visits, the connecting column is visitID and it's many-to-one relationship or vice versa. This reveals that one visit can have many transactions but not vice versa.
Created another table called DimDrug that contains Drug Name, Drug Category & Drug ID. Referenced the pharmacy transactions table and went ahead to delete the other columns and remained with Drug Name & Drug Category. Added an indexed column and formatted a prefix "DRG" to come up with the column Drug ID eg "DRG 1". Removed duplicates hence remained with six drug names and drug categories.
Created another table DimDate from refernced table Visits. Deleted all columns and remained with the visit date column. I added columns from date & time to return columns month, month name, quatre & year
Created 2 tables from the visits table namely FacVisits & DimVisits. FacVisits is a fact table containing patientID, VisitID, Visit date & Length of stay days these are measurable data hence fact table. The DimVisits table contains VisitID, Diagnosis & Department, these are descriptive data hence dimension table
In the Pharmacy Transactions table, added Drug ID by adding a conditional column and using an IF & Else statement that returns the correct drug ID paired to the drug name from the DimDrug table.
Merged queries using visitID both in Pharmacy Transactions & Visits table using Left outer merge. Added Patient_ID from Visits table into Pharmacy Transactions. From Ptaient to Pharmacy transactions, Patient ID used as one to many query.
For the relationships:
(Visit_ID)Visit table - (Visit_ID)Pharmacy transactions = one-many
(Visit_ID)FacVisits - (Visit_ID)Pharmacy transactions = one-many
(Visit_ID)DimVisits - (Visit_ID)Pharmacy transactions = one-many
(Visit_ID)DimDate - (Visit_ID)Pharmacy transactions = one-many
(Drug_ID)DimDrug - (Drug_ID)Pharmacy transactions = one-many
(Visit_ID)Visit table - (Visit_ID)Pharmacy transactions = one-many
(Patient_ID)Patient - (Patient_ID)Pharmacy transactions = one-many
(Visit_ID)Visit table_ - (Visit_ID)FacVisists = one-one
(Visit_ID)Visit table_ - (Visit_ID)DimDate = one-one
(Visit_ID)Visit table_ - (Visit_ID)DimVisits = one-one

Data Analysis

To visualise the diseases most common across counties, used a matrix table. The rows had the county from the patients table, and the columns were the diseases from the visits table. The value field was the count of visitID. Formatted the cells to show red colour for the max values, white colour for middle and blue colour for the minimum values. From the chart, Typhoid & Diabetes are common in Kiambu, Hypertension is most common in Kisumu, Diabetes is most common in Mombasa, Pneumonia is most common in the capital city and the flu is most common in Nakuru and Uasin Gishu.

To visualise which departments generate a higher pharmacy revenue - used a pie chart with the value being the total cost & the legend being the department. From the charts, Inpatient>>>Emergeny>>>Outpatient
To visualise which age groups consume the most drugs - used a stacked bar chart. Used a Switch function to come up with a column that will group ages into:
<1yr - Infant
1-14yrs - Child
15-44yrs - Young adult
45-59yrs - Middle Age
60-74yrs - Elderly

75yrs - Senior
ie

Age_Group = SWITCH(
    TRUE(),
    [Age] < 1, "Infant",
    [Age] >= 1 && [Age] <= 14, "Child",
    [Age] >= 15 && [Age] <= 44, "Young Adult",
    [Age] >= 45 && [Age] <= 59, "Middle-age",
    [Age] >= 60 && [Age] <= 74, "Elderly",
    [Age] >= 75, "Senior",
    "Unknown")

Used a stacked bar chart of age group vs the drug count.
From the chart, *young adults use more drugs compared to other age groups *

To visualise if a high number of patients always leads to higher pharmacy revenue, I used a combo chart. Came up with a new measure Total Visits=Total Visits = COUNTROWS(Visits). This is the sum of visits made to the facility. The columns were the total cost and the line was the total visits and the x-axis was the months. For statistical correctness, plotted a scatter plot where x-axis was total visits, y-axis was total cost and value was department. From the charts, we can conclude that high numbers of patients leads to a higher pharmacy revenue, evidenced by the peak months of July & August. The scatter plot has a positive gradient, confirming the inference.
To visualise if some diagnosis are associated with longer hospital stay but less pharmacy spending, plotted a combo chart for this data. The x-axis was diagnosis, the column was the total cost, and the line was the total length of stay. From the chart, typhoid is associated with longer hospital stay and higher pharmacy spending, while the flu is associated with a longer hospital stay but little pharmacy spending.

Dashboard

KPI cards were:

Total visits - Total visits measure ie Total Visits = COUNTROWS(Visits)
Total Pharmacy revenue - Total revenue measure ie Total Revenue = sum(Pharmacy_Transactions[Total_Cost])
Average days spent - Average days stayed ie Average days stayed = AVERAGE(Visits[Length_of_Stay_Days])
Plotted chars showing:
Diseases trend over time - A line chart where x-axis is quarter y axis is count of diagnosis and legend is diagnosis
Pharmacy cost breakdown by category - A stacked bar chart where x axis is diagnosis and y axis is total revenue
County and Department comparisons - A stacked bar chart where x axis is county and y axis is count of visits and the legend is departments.
Slicers for county, department and visit date

SCHEMAS AND DATA MODELLING - POWER BI

Dishon Gatambia (Dd) — Tue, 03 Feb 2026 14:45:34 +0000

Data modelling is a foundational aspect of Power BI that determines how data is structured, related, and queried for analysis. Data modelling defines the structural relationship between tables to ensure query performance and reporting accuracy. This article explores key concepts like star and snowflake schemas, fact and dimension tables, relationships, and why prioritizing good design is essential for optimal performance and reliable reporting.

Understanding Star Schema

The star schema is the standard for Power BI. It utilises a central fact table connected to multiple dimension tables.

Fact Tables: Contain quantitative metrics (e.g., sales, temperature) and foreign keys. They define the model's granularity.
Dimension Tables: Contain descriptive attributes such as business attributes (e.g., product names, dates). They provide the context for filtering and grouping.

Relationships in a star schema are typically one-to-many, flowing from the dimension table to the fact table. This configuration minimizes query complexity. Usually the fact tables represent the "many" while dimension tables represent the "one" aspect in the relationships.

Snowflake Schema

A snowflake schema normalises dimension tables into sub-dimensions (e.g., a product dimension split into separate tables for category, subcategory, and product details). While this reduces data redundancy at the source, it is inefficient for Power BI. This is because it results in more tables, longer filter propagation chains, increased model complexity, and poorer performance due to additional joins.
Snowflake schemas may be useful in specific scenarios (e.g., when source data is heavily normalised or storage is a major constraint)

Performance Impact: Increased table counts and longer filter propagation chains degrade speed.

Recommendation: Denormalise data into single-dimension tables to simplify the semantic model and improve usability.

Fact and Dimension Tables

Fact tables contain measurable data (e.g., sales orders, inventory levels) with foreign keys linking to dimensions and numeric columns for aggregation. They grow over time and define the model's granularity—ensuring consistency is critical to avoid inaccurate summaries.
Dimension tables hold descriptive attributes (e.g., product names, customer details, dates) with a unique key (often a surrogate key for handling changes like slowly changing dimensions). They are smaller and support hierarchies for drilling down in reports

Relationships in Power Bi

Relationships define how tables connect and how filters propagate.
Directionality: Single-direction or active relationships filters are preferred. Bi-directional and many-to-many relationships should be limited as they increase logic complexity and slow queries

Active vs. Inactive: Active relationships are the default path for filter propagation. Role-playing dimensions (e.g., multiple date types) should be handled via separate tables rather than complex inactive relationship chains.

Why Good Modelling Is Critical

Poor data modelling leads to bloated models, slow report rendering, inaccurate results (e.g., from inconsistent granularity), and a confusing user experience. A well-designed star schema optimises compression, reduces query complexity, and scales better for large datasets.
In contrast, flat tables or excessive snowflaking increase redundancy or joins, hurting refresh times and report interactivity. Best practices include using Power Query for transformations, surrogate keys, and avoiding unnecessary columns to keep models lean.
In summary, prioritising star schema principles in Power BI data modelling delivers faster, more reliable insights—making it a cornerstone for any serious BI implementation.

Getting Started with Git & Github

Dishon Gatambia (Dd) — Sun, 25 Jan 2026 19:17:09 +0000

What is Git?

Git is a distributed version control software system that is capable of managing versions of source code or data. It is often used to control source code by programmers who are developing software collaboratively.

Why is version control important

Version control (also known as source control or revision control) is a system that records and manages changes to a file or set of files, most commonly source code, over time

The importance of version control is listed below:

Safety: you can undo mistakes without losing work.
Collaboration: multiple people can work on the same codebase without overwriting each other.
Accountability: history shows who changed what and why (commit messages).
Experimentation: branches let you try features without risking the main code.

Pulling code from Github (step by step guide)

Code retrieval from GitHub is categorized into two primary operations: Initial Acquisition (Clone) and Synchronization (Pull).

1. Clone:

Used when the code doesn't exist on the local machine

Identify Repository URL: On the GitHub repository page, click the Code button and copy the URL (HTTPS or SSH).
Initialize Download: Open the terminal and execute: git clone <repository-url>
Result: Git creates a directory named after the repository, downloads all files, branches, and the full commit history, and configures the remote reference (origin).

2. Pull/Synchronization

Used to update an existing local repository with changes from Github
- Navigate to Directory: Enter the local project folder.

- Execute Update: Run the following command: git pull origin <branch-name> Example: git pull origin main

- Mechanism: This command executes two sub-operations:

Fetch: Downloads remote data without altering local files.

Merge: Integrates remote changes into the current active branch.

Pushing code to Github (step-by-step guide)

Pushing code to GitHub requires a staged sequence: preparing files, creating a local snapshot, and transmitting data to the remote server.

1. Initial Upload (New Repository)

Use this sequence to link a local project to a newly created GitHub repository.

- Create Remote Repository: On GitHub, create a new repository. Do not initialize with a README, .gitignore, or license if code already exists locally.

- Initialize Local Git: Navigate to the project root and execute: git init

- Stage Files: Add all files to the staging area: git add.

- Create First Commit: Record the snapshot: git commit -m "initial commit"

- Define Main Branch: Ensure the primary branch is named main: git branch -M main

- Link Remote URL: Connect the local repository to GitHub: git remote add origin <github-repo-url>

Execute Push: Upload the code and set the upstream reference: git push -u origin main

2. Standard Synchronization (Existing Repository)

Use this sequence for ongoing updates to a repository that is already linked.

Stage Specific Changes: git add <file-name> (or git add. for all changes)

Commit: git commit -m "description of changes"

Push: git push

Git Tracking (step-by-step guide)

Git tracks changes by managing data between three logical states: the Working Directory (unsaved changes), the Staging Area (prepared changes), and the Local Repository (permanent history).

1. Initialization

Activate Git tracking in a project directory: git init This creates a .git subdirectory to store metadata and object databases.

2. State Verification

Determine the current status of files: git status

- Untracked: New files unknown to Git.

- Modified: Tracked files with unsaved changes.

- Staged: Changes moved to the Index, ready for the next snapshot.

3. Change Preparation (Staging)

Select specific changes for inclusion in the next version:

Single file: git add <file-name>
All changes: git add .
Interactive (partial file): git add -p

4. Version Finalization (Commit)

Record the staged changes as a permanent snapshot: git commit -m "Direct description of change" Standard format: Use imperative mood (e.g., "Fix logic error" rather than "Fixed logic error").

5. Change Analysis Tools

Real-time Differences

Analyze modifications before staging or committing:

- Unstaged changes: git diff (Compares working directory to staging area)

- Staged changes: git diff --staged (Compares staging area to last commit)

Historical Review

Inspect the chronological record of changes:

- Summary list: git log --oneline

- Detailed patches: git log -p

- File-specific history: git log -- <file-path>

Line-Level Attribution

Identify when and by whom specific lines were altered: git blame <file-name>

How to Excel in Data Analytics: A Beginner's Guide - MS Excel for Data Analytics

Dishon Gatambia (Dd) — Sun, 25 Jan 2026 19:13:22 +0000

MS Excel is one of the most beginner-friendly tools for data analysis. It allows users to manipulate, process, and visualise large datasets efficiently, turning raw data into meaningful insights.

Why MS Excel for data analytics

Useful in many entry-level analysis tasks eg: formulas, tables, conditional formatting, charts, pivottables etc
It's fast and easy
Widely available in many institutions, workplaces and schools.

What we will learn in this article

Data cleaning: Removing duplicates, sorting & filtering
Simple formulas
Conditional formatting
Data Visualisation: Tables, Charts, Pivot tables

Data Cleaning

Before data analysis commences, it's important to clean and organize the dataset to ensure accuracy. The common cleaning tasks are:

Removing duplicates

Use Data > Remove duplicates This removes redundancy in a dataset

Sorting Data

Sorting data makes it easier to immediately view and comprehend your data, organize and locate the facts you need, and ultimately help you make better decisions.

Filtering Data

This fn enables you to pull information from a given range that specifies the criteria. It shows the data that's only required.
Select any column from the table. After that go to the data tab on the top of the ribbon and then in the sort and filters group select filter.

Simple Formulas

Built in fns in Excel enable users perform calculations easily. Below are some of the examples:

Sum

Adds a range of numbers =sum (a2:a877) This adds all values from cell a2 to a877
Average

Calculates the averages of numbers =average (a2:a877) This displays the average value of the numbers between cell a2 and cell a877
Count

Counts how many cells display numbers. =count (a2:a877)
CountA

Counts cells that aren't empty =countA(a2:a877)
There are more formulas used in excel. Above formulas are just but a few of them

Conditional Formating

Trends and patterns in a dataset can be highlighted for easier visualization.
Step 1: Go to Home > Conditional Formatting.
Select any column from the table. After that go to the home tab on the top of the ribbon and then in the styles group select conditional formatting and then in the highlight cells rule select Greater than an option.

Then a greater than dialog box appears. First write the quarter value and then select the colour.

Step 2: Preview Result
As you can see in the excel table 'quarter' column changes the colour of the values that are greater than 6.

Charts

Any set of info can be graphically represented using a chart. Excel offers several chart types of your choice. Charts make it easier to identify trends and relationships

Select your dataset and go to Insert > Charts.
Choose from bar charts, line charts, or pie charts.
Customize the chart for clarity and impact.

Pivot Tables

Pivot tables much like charts help visualise datasets in graphical representation. The difference is pivot tables use charts for representation. So what are pivot tables? They summarize, analyze, and reorganise large datasets without formulas, allowing you to drag and drop headers to create interactive reports.
Steps to create a Pivot Table:
Select your data
Go to Insert > PivotTable
Drag fields into Rows, Columns, and Values
Example:

Rows > Department
Values > Count of Expense Type
The count of expense type will automatically be done by Excel.

Dashboards

Finally with all these tools and more we can now generate dashboards. Dashboards are paramount in data analysis. So what are dashboards? Like a car's dashboard that displays the combined data of a vehicle, dashboards consolidate key business metrics, charts and tables into a single view for easy analysis. What does a dashboard consist of:

Charts
Slicers (interactive filters)
Pivot tables

DEV Community: Dishon Gatambia (Dd)

Connecting Power BI to SQL Databases

Table of Contents

1. Introduction: Power BI and SQL Databases

Why connect Power BI to a database?

The role of SQL databases in analytical workflows

2. Connecting to a Local PostgreSQL Database

Prerequisites

Step-by-step connection process

3. Connecting to Aiven Cloud PostgreSQL

Obtaining connection details from Aiven

Why SSL certificates are required

Step-by-step: connecting via Power BI Desktop

4. Loading Tables and Creating Relationships

How Power BI auto-detects relationships

Creating and editing relationships manually

What data modelling enables

5. Why SQL Skills Matter for Power BI Analysts

Four ways SQL strengthens Power BI work

SQL in the context of this schema

Summary: SQL and Power BI as complementary layers

SQL ANALYTICAL MECHANICS: JOINS & WINDOW FUNCTIONS

Core Principles

What a Join actually does

Types of Joins

Inner Joins

Left Join (Left Outer Join)

Right Join

Full Outer Join

Cross Join

Self Join

Use cases for Joins

Window Functions

Partition By

ORDER BY Inside OVER

Types of Window functions

Ranking functions

Row_Number()

Rank()

Dense_Rank()

NTILE(n)

LAG() and LEAD()

FIRST_VALUE() and LAST_VALUE()

Window Frames: ROWS vs RANGE

Concept in plain terms

How Analysts Translate Messy Data, DAX, and Dashboards into Action Using Power BI

Introduction

Data Ingestion and Transformation

DAX: The Analytical Layer

Dashboard Construction and Interaction

Translation to Action

Conclusion

HOSPITAL & PHARMACY DATA ANALYSIS

Data Cleaning

Data Modelling

Data Analysis

Dashboard

SCHEMAS AND DATA MODELLING - POWER BI

Understanding Star Schema

Snowflake Schema

Fact and Dimension Tables

Relationships in Power Bi

Why Good Modelling Is Critical

Getting Started with Git & Github

What is Git?

Why is version control important

Pulling code from Github (step by step guide)

1. Clone:

2. Pull/Synchronization

Pushing code to Github (step-by-step guide)

1. Initial Upload (New Repository)

2. Standard Synchronization (Existing Repository)

Git Tracking (step-by-step guide)

1. Initialization

2. State Verification

3. Change Preparation (Staging)

4. Version Finalization (Commit)

5. Change Analysis Tools

Real-time Differences

Historical Review