<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Dishon Gatambia (Dd)</title>
    <description>The latest articles on DEV Community by Dishon Gatambia (Dd) (@dishon_gatambiadd_31a1).</description>
    <link>https://dev.to/dishon_gatambiadd_31a1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3708640%2F288398dc-77a5-42f9-9de4-9c1047acd98f.png</url>
      <title>DEV Community: Dishon Gatambia (Dd)</title>
      <link>https://dev.to/dishon_gatambiadd_31a1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dishon_gatambiadd_31a1"/>
    <language>en</language>
    <item>
      <title>Connecting Power BI to SQL Databases</title>
      <dc:creator>Dishon Gatambia (Dd)</dc:creator>
      <pubDate>Tue, 24 Mar 2026 11:22:48 +0000</pubDate>
      <link>https://dev.to/dishon_gatambiadd_31a1/connecting-power-bi-to-sql-databases-288l</link>
      <guid>https://dev.to/dishon_gatambiadd_31a1/connecting-power-bi-to-sql-databases-288l</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;A practical guide to integrating Power BI Desktop with local PostgreSQL and cloud-hosted Aiven databases - including data modelling and why SQL still matters.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Introduction: Power BI and SQL Databases&lt;/li&gt;
&lt;li&gt;Connecting to a Local PostgreSQL Database&lt;/li&gt;
&lt;li&gt;Connecting to Aiven Cloud PostgreSQL&lt;/li&gt;
&lt;li&gt;Loading Tables and Creating Relationships&lt;/li&gt;
&lt;li&gt;Why SQL Skills Matter for Power BI Analysts&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  1. Introduction: Power BI and SQL Databases
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Microsoft Power BI&lt;/strong&gt; is one of the leading business intelligence platforms in use today. It enables organisations of all sizes to transform raw data into interactive dashboards, reports, and visualisations that help decision-makers act on evidence rather than intuition. From tracking monthly sales performance to monitoring operational KPIs in real time, Power BI sits at the centre of how modern businesses consume their data.&lt;/p&gt;

&lt;p&gt;Power BI is available in several forms. &lt;strong&gt;Power BI Desktop&lt;/strong&gt; is the Windows application used to build reports and data models. &lt;strong&gt;Power BI Service&lt;/strong&gt; is the cloud-based platform where those reports are published and shared across an organisation. Together, they cover the full lifecycle of analytical work - from raw data connection to executive-level dashboarding.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why connect Power BI to a database?
&lt;/h3&gt;

&lt;p&gt;While Power BI can import data from Excel files, CSV exports, and web APIs, these sources have significant limits. They are static, often out of date, and difficult to maintain at scale. A well-structured relational database, by contrast, is the authoritative source of truth for most business data. It stores transactions, customer records, inventory levels, and operational events with precision, consistency, and referential integrity.&lt;/p&gt;

&lt;p&gt;When Power BI connects directly to a database, analysts can query the freshest available data, apply complex filters at the database level, and avoid the overhead of manually exporting and re-importing flat files. The database handles storage and retrieval efficiently; Power BI handles visualisation and exploration. Each tool does what it does best.&lt;/p&gt;

&lt;h3&gt;
  
  
  The role of SQL databases in analytical workflows
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;SQL (Structured Query Language)&lt;/strong&gt; databases, including PostgreSQL, Microsoft SQL Server, and MySQL, are the backbone of most enterprise data architectures. They organise data into tables with clearly defined schemas, enforce relationships between entities, and support powerful querying through the SQL language.&lt;/p&gt;

&lt;p&gt;PostgreSQL, in particular, is an open-source relational database widely used in both development and production environments. It supports advanced data types, complex joins, window functions, and JSON storage, making it a versatile choice for analytical workloads. Whether self-hosted on a local machine or managed in the cloud through platforms like &lt;strong&gt;Aiven&lt;/strong&gt;, PostgreSQL integrates cleanly with Power BI.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Connecting to a Local PostgreSQL Database
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What this guide covers:&lt;/strong&gt; This guide walks through connecting Power BI to both a local PostgreSQL instance and a cloud-hosted Aiven database. It also covers data modelling with four linked tables: &lt;code&gt;customers&lt;/code&gt;, &lt;code&gt;products&lt;/code&gt;, &lt;code&gt;sales&lt;/code&gt;, and &lt;code&gt;inventory&lt;/code&gt; and concludes with a discussion of why SQL fluency is valuable for BI analysts.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;A &lt;strong&gt;local PostgreSQL database&lt;/strong&gt; runs on the same machine as Power BI Desktop, or on a machine within your local network. This is the standard setup for development, testing, or environments where the data does not leave the building. The connection process requires no SSL configuration and is straightforward once PostgreSQL is running and a database exists.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;Before beginning, confirm the following are in place:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Power BI Desktop is installed (Windows only).&lt;/li&gt;
&lt;li&gt;PostgreSQL is installed, and the target database is created.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;Npgsql PostgreSQL connector&lt;/strong&gt; is installed. Power BI requires this driver to communicate with PostgreSQL. Download it from the official Npgsql releases page and install it before opening Power BI.&lt;/li&gt;
&lt;li&gt;You know the database name, a valid username, and its password.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step-by-step connection process
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Step 1 - Open Power BI Desktop&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Launch the application. On the start screen, click &lt;strong&gt;Get Data&lt;/strong&gt;. If you are already inside a report, navigate to &lt;strong&gt;Home → Get Data&lt;/strong&gt; in the ribbon.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2 - Search for PostgreSQL Database&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;em&gt;Get Data&lt;/em&gt; dialogue, type &lt;code&gt;PostgreSQL&lt;/code&gt; in the search box. Select &lt;strong&gt;PostgreSQL Database&lt;/strong&gt; from the results and click &lt;strong&gt;Connect&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0hlhp7c4bfl7epk043dy.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0hlhp7c4bfl7epk043dy.jpg" alt="Figure 1 - Selecting PostgreSQL Database in the Get Data dialogue" width="360" height="605"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3 - Enter the server and database details&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the connection dialogue, fill in two fields:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Server&lt;/strong&gt; - enter &lt;code&gt;localhost&lt;/code&gt; for a local instance, or a hostname/IP address for a network server.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database&lt;/strong&gt; - enter the exact name of the PostgreSQL database you want to connect to (for example, &lt;code&gt;assignment&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Leave the &lt;em&gt;Data Connectivity mode&lt;/em&gt; as &lt;strong&gt;Import&lt;/strong&gt; unless you specifically require DirectQuery.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcly0kzxbxtu4qknvwymt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcly0kzxbxtu4qknvwymt.png" alt="Figure 2 - Entering server name and database name for the local PostgreSQL connection" width="315" height="160"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4 - Provide credentials&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Power BI will prompt for a username and password. Select &lt;strong&gt;Database&lt;/strong&gt; under the credential type dropdown, then enter your PostgreSQL username (often &lt;code&gt;postgres&lt;/code&gt;) and the corresponding password. Click &lt;strong&gt;Connect&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5 - Select and load tables&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;Navigator&lt;/em&gt; pane will display all schemas and tables in the database. Check the tables you want to load - for example, &lt;code&gt;customers&lt;/code&gt;, &lt;code&gt;products&lt;/code&gt;, &lt;code&gt;sales&lt;/code&gt;, and &lt;code&gt;inventory&lt;/code&gt;. Click &lt;strong&gt;Load&lt;/strong&gt; to import them directly, or &lt;strong&gt;Transform Data&lt;/strong&gt; to open the Power Query Editor first.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Connection flow:

 [Power BI Desktop] - [Get Data / PostgreSQL] - [Server &amp;amp; Credentials] - [Navigator] - [Load]
      Step 1                  Step 2                    Steps 3–4           Step 5        Final
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Figure 3 - Local PostgreSQL connection flow&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; If the connection fails with a driver error, install the Npgsql connector and restart Power BI Desktop before trying again.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  3. Connecting to Aiven Cloud PostgreSQL
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Aiven&lt;/strong&gt; is a managed cloud database platform that hosts PostgreSQL (and other databases) on your choice of cloud provider - AWS, Google Cloud, or Azure. Connecting Power BI to an Aiven PostgreSQL instance follows the same general steps as a local connection, with two important differences: the connection details are specific to your Aiven service, and &lt;strong&gt;SSL must be used&lt;/strong&gt; to encrypt the connection.&lt;/p&gt;

&lt;h3&gt;
  
  
  Obtaining connection details from Aiven
&lt;/h3&gt;

&lt;p&gt;Log in to the &lt;a href="https://console.aiven.io" rel="noopener noreferrer"&gt;Aiven Console&lt;/a&gt; and open your PostgreSQL service. On the service overview page, you will find all the information needed to establish a connection:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Where to find it&lt;/th&gt;
&lt;th&gt;Example value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Host&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Service Overview - Connection Information&lt;/td&gt;
&lt;td&gt;&lt;code&gt;pg-abc123.aivencloud.com&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Port&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Next to the host, typically a custom port&lt;/td&gt;
&lt;td&gt;&lt;code&gt;15432&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Database&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Listed under the service name&lt;/td&gt;
&lt;td&gt;&lt;code&gt;defaultdb&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Username&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Connection Information section&lt;/td&gt;
&lt;td&gt;&lt;code&gt;avnadmin&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Password&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Click the eye icon or copy button&lt;/td&gt;
&lt;td&gt;&lt;em&gt;(hidden - copy directly)&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SSL Certificate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Download button in Connection Information&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ca.pem&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Download the &lt;strong&gt;CA certificate&lt;/strong&gt; (&lt;code&gt;ca.pem&lt;/code&gt;) and save it to a location you can reference easily, such as &lt;code&gt;C:\certs\aiven-ca.pem&lt;/code&gt; on Windows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why SSL certificates are required
&lt;/h3&gt;

&lt;p&gt;A cloud database is accessible over the public internet. Without encryption, data transmitted between Power BI and the Aiven server - including credentials and query results - would be visible to anyone monitoring the network. SSL (Secure Sockets Layer) / TLS (Transport Layer Security) encrypts the entire connection, preventing interception.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;CA certificate&lt;/strong&gt; serves a second purpose: it allows Power BI to verify that it is connecting to the genuine Aiven server and not an impostor. This is known as certificate verification, and it protects against man-in-the-middle attacks. Aiven requires SSL on all connections; it cannot be disabled.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frm4d8icc890oh83l8wmu.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frm4d8icc890oh83l8wmu.webp" alt="Aiven cloud connection with SSL/TLS encryption" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step-by-step: connecting via Power BI Desktop
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Step 1 - Open Get Data - PostgreSQL Database&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Follow the same steps as the local connection: &lt;strong&gt;Home - Get Data - PostgreSQL Database - Connect&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2 - Enter the Aiven host and port&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;strong&gt;Server&lt;/strong&gt; field, enter the full Aiven hostname followed by a colon and the port number:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pg-abc123.aivencloud.com:15432
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the &lt;strong&gt;Database&lt;/strong&gt; field, enter the database name (often &lt;code&gt;defaultdb&lt;/code&gt; unless you created a named database).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3 - Expand Advanced Options and add the SSL certificate&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the connection dialogue, click &lt;strong&gt;Advanced Options&lt;/strong&gt; to reveal additional fields. In the &lt;strong&gt;Additional connection string parameters&lt;/strong&gt; box, enter the SSL certificate path in the following format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;sslmode&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;verify-ca;sslrootcert=C:&lt;/span&gt;&lt;span class="se"&gt;\c&lt;/span&gt;&lt;span class="s"&gt;erts&lt;/span&gt;&lt;span class="se"&gt;\a&lt;/span&gt;&lt;span class="s"&gt;iven-ca.pem&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells Power BI to use SSL, verify the server's certificate, and trust only certificates signed by the CA you downloaded.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4 - Enter credentials and connect&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Click &lt;strong&gt;OK&lt;/strong&gt;. When prompted, select &lt;strong&gt;Database&lt;/strong&gt; authentication, enter the Aiven username (typically &lt;code&gt;avnadmin&lt;/code&gt;) and password copied from the Aiven Console. Click &lt;strong&gt;Connect&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5 - Select tables in the Navigator&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Navigator pane will display the available tables. Select the required tables and click &lt;strong&gt;Load&lt;/strong&gt; or &lt;strong&gt;Transform Data&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Certificate path must use the correct syntax.&lt;/strong&gt; On Windows, use double backslashes or forward slashes in the path. If the &lt;code&gt;ca.pem&lt;/code&gt; file is not found, Power BI will fail to connect with an SSL handshake error. Verify the file exists at the exact path specified.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  4. Loading Tables and Creating Relationships
&lt;/h2&gt;

&lt;p&gt;Once the connection is established, Power BI loads the selected tables into its internal data model. For this guide, the PostgreSQL database contains four tables organised around a retail business scenario:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Table&lt;/th&gt;
&lt;th&gt;Primary key&lt;/th&gt;
&lt;th&gt;Key columns&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;customers&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;customer_id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;first_name, last_name, email, registration_date, membership_status&lt;/td&gt;
&lt;td&gt;50 customer records with registration and membership data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;products&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;product_id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;product_name, category, price, supplier, stock_quantity&lt;/td&gt;
&lt;td&gt;15 products across categories with pricing and supplier info&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sales&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;sale_id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;customer_id (FK), product_id (FK), quantity_sold, sale_date, total_amount&lt;/td&gt;
&lt;td&gt;15 transaction records from 2023–2024&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;inventory&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;product_id&lt;/code&gt; (FK)&lt;/td&gt;
&lt;td&gt;stock_quantity&lt;/td&gt;
&lt;td&gt;Current stock levels for each product&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  How Power BI auto-detects relationships
&lt;/h3&gt;

&lt;p&gt;After loading these tables, Power BI may automatically detect relationships based on matching column names and data types. In this schema, it will likely identify that &lt;code&gt;sales.customer_id&lt;/code&gt; references &lt;code&gt;customers.customer_id&lt;/code&gt;, and that &lt;code&gt;sales.product_id&lt;/code&gt; references &lt;code&gt;products.product_id&lt;/code&gt;. The &lt;code&gt;inventory&lt;/code&gt; table shares &lt;code&gt;product_id&lt;/code&gt; with &lt;code&gt;products&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To inspect and manage relationships, navigate to the &lt;strong&gt;Model view&lt;/strong&gt; in Power BI Desktop, the icon that looks like three connected boxes in the left sidebar. Here you can see a visual map of all tables and the lines connecting them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fczgznlpxiz8haih3yzo7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fczgznlpxiz8haih3yzo7.png" alt="Data model showing relationships between the four tables" width="285" height="177"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating and editing relationships manually
&lt;/h3&gt;

&lt;p&gt;If Power BI does not detect relationships automatically, or if a detected relationship is incorrect, you can manage them manually. In the Model view, drag from a foreign key column in one table to the primary key in another. Power BI will draw the relationship line and ask you to confirm the cardinality and cross-filter direction.&lt;/p&gt;

&lt;p&gt;For this schema, the three key relationships are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;customers.customer_id&lt;/code&gt; - &lt;code&gt;sales.customer_id&lt;/code&gt; (one-to-many)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;products.product_id&lt;/code&gt; - &lt;code&gt;sales.product_id&lt;/code&gt; (one-to-many)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;products.product_id&lt;/code&gt; - &lt;code&gt;inventory.product_id&lt;/code&gt; (one-to-one)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What data modelling enables
&lt;/h3&gt;

&lt;p&gt;With relationships in place, Power BI treats the tables as a unified data model rather than isolated datasets. When you build a chart showing total sales revenue by product category, Power BI knows to join &lt;code&gt;sales&lt;/code&gt; to &lt;code&gt;products&lt;/code&gt; via &lt;code&gt;product_id&lt;/code&gt; to retrieve the category name. When filtering by customer membership status, it traverses the relationship from &lt;code&gt;customers&lt;/code&gt; to &lt;code&gt;sales&lt;/code&gt; automatically.&lt;/p&gt;

&lt;p&gt;This is the core principle of &lt;strong&gt;star schema data modelling&lt;/strong&gt;: a central fact table (&lt;code&gt;sales&lt;/code&gt;) linked to dimension tables (&lt;code&gt;customers&lt;/code&gt;, &lt;code&gt;products&lt;/code&gt;) that describe the who, what, and when of each transaction. The &lt;code&gt;inventory&lt;/code&gt; table functions as a supplementary dimension providing current stock context alongside product data.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note on cross-filter direction:&lt;/strong&gt; By default, Power BI uses single-directional filtering; filters flow from the dimension table into the fact table. In most cases, this is correct. Avoid enabling bidirectional filtering unless you have a specific requirement, as it can produce unexpected aggregation results and slow report performance.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  5. Why SQL Skills Matter for Power BI Analysts
&lt;/h2&gt;

&lt;p&gt;Power BI's graphical interface makes it possible to build dashboards without writing a single line of SQL. But analysts who understand SQL bring a fundamentally different level of capability to their work. SQL is not a requirement for using Power BI, it is a requirement for using it &lt;strong&gt;well&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"An analyst who can write SQL is not just faster, they understand the data at a structural level that shapes every design decision they make in Power BI."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Four ways SQL strengthens Power BI work
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Precise data retrieval&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Rather than loading an entire table and filtering inside Power BI, an analyst with SQL knowledge writes a query that retrieves only the rows and columns needed. This reduces memory usage, speeds up refresh times, and keeps the data model lean.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Filtering at the source&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;SQL &lt;code&gt;WHERE&lt;/code&gt; clauses filter data before it reaches Power BI. An analyst who understands this can avoid importing years of historical records when only the past 12 months are relevant to the dashboard, a significant difference at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Pre-aggregation and joins&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Complex calculations, such as monthly revenue per customer segment, or average order value by product category, can be computed in SQL before the data is loaded. This offloads processing to the database engine, which handles large aggregations far more efficiently than Power BI's in-memory model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Data preparation and quality&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;SQL lets analysts clean and reshape data at the source, standardising date formats, handling nulls, concatenating name fields, or pivoting rows into columns, before the data ever reaches Power Query. This keeps the Power BI model simple and the transformation logic auditable.&lt;/p&gt;

&lt;h3&gt;
  
  
  SQL in the context of this schema
&lt;/h3&gt;

&lt;p&gt;With the &lt;code&gt;assignment&lt;/code&gt; database used throughout this guide, a Power BI analyst who understands SQL can write queries like the one below to pre-aggregate sales data before loading it, rather than importing all 15 raw transaction rows and computing totals inside DAX:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;select&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;membership_status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;revenue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sale_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;transactions&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;
&lt;span class="k"&gt;inner&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;
&lt;span class="k"&gt;inner&lt;/span&gt; &lt;span class="k"&gt;join&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;on&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;membership_status&lt;/span&gt;
&lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;revenue&lt;/span&gt; &lt;span class="k"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This query produces a compact summary table; joined, grouped, and sorted, ready for Power BI to visualise. An analyst who cannot write this query must load three raw tables, build the join in Power Query, and compute the aggregations with DAX. The result is the same, but the path is longer, more error-prone, and harder to debug.&lt;/p&gt;

&lt;h3&gt;
  
  
  Summary: SQL and Power BI as complementary layers
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Responsibility&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Storage &amp;amp; retrieval&lt;/td&gt;
&lt;td&gt;PostgreSQL&lt;/td&gt;
&lt;td&gt;Tables, indexes, relationships, data integrity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transformation&lt;/td&gt;
&lt;td&gt;SQL&lt;/td&gt;
&lt;td&gt;Filtering, joining, aggregating, cleaning at source&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Modeling&lt;/td&gt;
&lt;td&gt;Power BI Desktop&lt;/td&gt;
&lt;td&gt;Star schema, relationships, DAX measures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visualisation&lt;/td&gt;
&lt;td&gt;Power BI Desktop / Service&lt;/td&gt;
&lt;td&gt;Dashboards, reports, interactive charts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;SQL and Power BI are not competing tools; they are complementary layers in the same analytical pipeline. SQL handles structured retrieval and transformation at the database level. Power BI handles interactive visualisation and self-service exploration at the consumer level. Fluency in both means the analyst decides, with full awareness, where each operation belongs.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Data Engineering &amp;amp; BI · Technical Guide · PostgreSQL · Aiven · Power BI Desktop&lt;/em&gt;&lt;/p&gt;

</description>
      <category>database</category>
      <category>postgres</category>
      <category>sql</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>SQL ANALYTICAL MECHANICS: JOINS &amp; WINDOW FUNCTIONS</title>
      <dc:creator>Dishon Gatambia (Dd)</dc:creator>
      <pubDate>Sun, 08 Mar 2026 15:17:34 +0000</pubDate>
      <link>https://dev.to/dishon_gatambiadd_31a1/sql-analytical-mechanics-joins-window-functions-pc4</link>
      <guid>https://dev.to/dishon_gatambiadd_31a1/sql-analytical-mechanics-joins-window-functions-pc4</guid>
      <description>&lt;p&gt;Structured Query Language (SQL) is the fundamental protocol for relational database management and data retrieval. Sophisticated analysis relies on two primary mechanisms: &lt;strong&gt;Joins&lt;/strong&gt; and &lt;strong&gt;Window&lt;/strong&gt; functions. Joins allow you to combine data from multiple tables, while window functions enable advanced calculations over subsets of data without collapsing rows. This technical overview details implementation strategies and optimisation for practitioners with foundational SQL knowledge.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Primary Mechanism&lt;/th&gt;
&lt;th&gt;Result Set Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Joins&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Horizontal merging of distinct tables via shared keys.&lt;/td&gt;
&lt;td&gt;Alters row count and structure based on matching logic.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Window Functions&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Analytical computations across defined row subsets (windows).&lt;/td&gt;
&lt;td&gt;Preserves original row count; appends calculated data.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Core Principles
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Joins&lt;/strong&gt;: Essential for reconstructing normalized data structures. Operations include &lt;code&gt;INNER&lt;/code&gt;, &lt;code&gt;LEFT&lt;/code&gt;, &lt;code&gt;RIGHT&lt;/code&gt;, and &lt;code&gt;FULL OUTER&lt;/code&gt; joins to define the scope of the intersection or union.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Window Functions&lt;/strong&gt;: Utilised for ranking &lt;code&gt;(RANK())&lt;/code&gt;, running totals &lt;code&gt;(SUM() OVER),&lt;/code&gt; and time-series analysis &lt;code&gt;(LAG/LEAD)&lt;/code&gt;. The &lt;code&gt;OVER&lt;/code&gt; clause defines the logic through &lt;code&gt;PARTITION BY&lt;/code&gt; and &lt;code&gt;ORDER BY&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What a Join actually does
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;JOIN&lt;/strong&gt; is the mechanism by which a relational database combines rows from two or more tables based on a related column. The engine performs a logical comparison between rows of a left and right dataset, then determines which row combinations satisfy the specified condition.&lt;br&gt;
The key insight: a &lt;strong&gt;JOIN&lt;/strong&gt; does not "&lt;em&gt;add columns to a table&lt;/em&gt;." It builds a new, temporary result set. Every row in that result is a pairing of rows from the participating tables. How many rows get included, and from which side, depends on the join type. Without joins, you'd be limited to single-table queries, which rarely suffice in real-world scenarios like e-commerce databases or customer relationship management systems.&lt;/p&gt;
&lt;h2&gt;
  
  
  Types of Joins
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Inner Joins
&lt;/h3&gt;

&lt;p&gt;Returns only the rows where the join condition is satisfied in both tables. Rows that have no match on either side are excluded entirely.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select e.name, d.department_name
from employees e 
inner join 
departments d on e.department_id = d.department_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What's happening&lt;/strong&gt; - For every row in the employees table, the query looks for matching rows in the departments table. If no match exists, that employee does not appear in the output. If three departments exist for one employee, that employee appears three times.&lt;/p&gt;

&lt;h3&gt;
  
  
  Left Join (Left Outer Join)
&lt;/h3&gt;

&lt;p&gt;Returns all rows from the left table, and the matched rows from the right table. Where no match exists on the right, &lt;em&gt;NULL&lt;/em&gt; fills the right-side columns.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select e.name, d.department_name
from departments d 
left join employees e on d.department_id = e.department_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use it&lt;/strong&gt;: Any time "missing" data is meaningful. If d.department_name is NULL, you know that employee has no assigned department. This is the join type to use for finding gaps.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;--Find employees with no assigned departments
select e.name, d.department_name
from departments d 
left join employees e on d.department_id = e.department_id
where d.department_name = NULL;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This returns employees with no assigned departments. &lt;/p&gt;

&lt;h3&gt;
  
  
  Right Join
&lt;/h3&gt;

&lt;p&gt;The mirror of &lt;strong&gt;LEFT JOIN&lt;/strong&gt;. All rows from the right table are preserved; &lt;em&gt;NULLs&lt;/em&gt; appear on the left where no match exists. In practice, most engineers rewrite RIGHT JOINs as LEFT JOINs by reversing table order because it's cleaner.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;--Finding employees and all departments (including Empty departments)
select e.name, d.department_name
from employees e 
right join departments d on e.department_id = d.department_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Full Outer Join
&lt;/h3&gt;

&lt;p&gt;Returns all rows from both tables. Where no match exists on either side, NULLs fill the unmatched columns. This union of both outer joins. In short, it shows all rows from all tables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;--Shows all rows from both tables
select e.name, d.department_name
from employees e 
full outer join departments d on e.department_id = d.department_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to use it&lt;/strong&gt;: Data reconciliation, audits, finding orphaned records on either side.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cross Join
&lt;/h3&gt;

&lt;p&gt;Produces the Cartesian product. Every row in the left table is paired with every row in the right table. No ON condition.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;--Every employee with every project 
select e.name, p.project_name 
from employees e 
cross join projects p;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A table with 10 rows crossed with a table of 5 rows produces 50 rows. Use deliberately. Running a CROSS JOIN on two large tables without a WHERE clause can produce billions of rows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Self Join
&lt;/h3&gt;

&lt;p&gt;A table joined to itself. The table is aliased twice to distinguish which "instance" is being referenced.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;--Employee who is a manager 
select distinct m.name as manager 
from employees e 
join employees m on e.manager_id = m.employee_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnhrrdefjm4ym0clpn7oi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnhrrdefjm4ym0clpn7oi.png" alt="Illustration of JOINs" width="800" height="386"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Use cases for Joins
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reporting&lt;/strong&gt;: Combine customers' details from a customers table with their orders from an orders table to generate sales reports.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Integration&lt;/strong&gt;: Merge data from disparate sources, such as user profiles and activity logs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Cleaning&lt;/strong&gt;: Identify and handle missing relationships, like orphaned records.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Best practices include using aliases for table names to improve readability, indexing join columns for performance, and avoiding unnecessary joins to prevent query slowdowns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Window Functions
&lt;/h2&gt;

&lt;p&gt;Window functions, introduced in SQL in 2003 and supported by most modern databases like PostgreSQL, MySQL (from version 8.0), and SQL Server, perform calculations across a set of rows related to the current row. Unlike aggregate functions (e.g., &lt;code&gt;SUM()&lt;/code&gt;, &lt;code&gt;AVG()&lt;/code&gt;) that group rows and reduce output, window functions maintain the original row count while adding computed columns.&lt;br&gt;
With &lt;code&gt;GROUP BY + SUM()&lt;/code&gt;, you get one row per group. With a window function &lt;code&gt;SUM() OVER (PARTITION BY ...)&lt;/code&gt;, every original row survives in the output, but each row gains a new column containing the aggregate value computed over its "window".&lt;br&gt;
A window function is defined using the &lt;code&gt;OVER()&lt;/code&gt; clause, which specifies the "window" or partition of rows to operate on. It can include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PARTITION BY&lt;/strong&gt;: Divides the result set into partitions (groups) where the function is applied independently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ORDER BY&lt;/strong&gt;: Sorts rows within each partition.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frame Clause&lt;/strong&gt;: Defines a subset of the partition (e.g., ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW for running totals).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Partition By
&lt;/h3&gt;

&lt;p&gt;Divides the result set into groups (partitions). The window function restarts its calculation for each partition. This is analogous to &lt;code&gt;GROUP BY&lt;/code&gt;, except the rows are not collapsed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-- Total sales per region, shown on every row
select
    region,
    salesperson,
    sales_amount,
    sum(sales_amount) over (partition by region) as region_total
from sales;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output will have one row per salesperson, but &lt;code&gt;region_total&lt;/code&gt; shows the sum for all salespeople in that region.&lt;/p&gt;

&lt;h3&gt;
  
  
  ORDER BY Inside OVER
&lt;/h3&gt;

&lt;p&gt;When ORDER BY is specified inside OVER(), the window function becomes aware of row sequence within each partition. For ranking functions, this determines rank. For running totals, it defines the cumulative direction.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-- Running total of sales per region, ordered by date
select
    region,
    sale_date,
    sales_amount,
    sum(sales_amount) over (
        partition by region
        order by sale_date
    ) as running_total
FROM sales;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Types of Window functions
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Ranking functions
&lt;/h2&gt;

&lt;p&gt;Assigns ranks or numbers to rows&lt;/p&gt;

&lt;h3&gt;
  
  
  Row_Number()
&lt;/h3&gt;

&lt;p&gt;Assigns a unique sequential integer to each row within a partition. Ties receive different numbers (non-deterministic unless the ORDER BY is fully deterministic).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-- Rank employees by salary within each department
select
    department,
    employee_name,
    salary,
    row_number() over (
        partition by department
        order by salary desc
    ) as row_num
from employees;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use case&lt;/strong&gt;: Deduplication - select WHERE row_num = 1 to keep only the highest-paid employee per department.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rank()
&lt;/h3&gt;

&lt;p&gt;Like ROW_NUMBER(), but ties receive the same rank, and the next rank skips. A tie at rank 2 means the next rank is 4 (not 3).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select
    department,
    employee_name,
    salary,
    rank() over (
        partition by department
        order by salary DESC
    ) as salary_rank
from employees;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Dense_Rank()
&lt;/h3&gt;

&lt;p&gt;Like RANK(), but no gaps after ties. A tie at rank 2 means the next rank is 3.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;select
    department,
    employee_name,
    salary,
    dense_rank() over (
        partition by department
        order by salary desc
    ) as dense_rank
from employees;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Salary&lt;/th&gt;
&lt;th&gt;ROW_NUMBER&lt;/th&gt;
&lt;th&gt;RANK&lt;/th&gt;
&lt;th&gt;DENSE_RANK&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;90&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  NTILE(n)
&lt;/h3&gt;

&lt;p&gt;Divides the partition into &lt;code&gt;n&lt;/code&gt; equal buckets and assigns a bucket number to each row.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-- Divide customers into quartiles by lifetime value
select
    customer_id,
    lifetime_value,
    ntile(4) over (order by lifetime_value desc) as value_quartile
from customers;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  LAG() and LEAD()
&lt;/h3&gt;

&lt;p&gt;Access values from a previous or subsequent row within the partition, without a self-join.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;LAG()&lt;/code&gt;: Returns a value from a previous row.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;LEAD()&lt;/code&gt;: Returns a value from a next row.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-- Month-over-month revenue change
select
    month,
    revenue,
    lag(revenue, 1) over (order by month) as prev_month_revenue,
    revenue - lag(revenue, 1) over (order month) as change
from monthly_revenue;

-- Show what the next month's revenue will be
select
    month,
    revenue,
    lead(revenue, 1) over (order by month) as next_month_revenue
from monthly_revenue;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  FIRST_VALUE() and LAST_VALUE()
&lt;/h3&gt;

&lt;p&gt;Return the first or last value in an ordered window frame.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-- Show each sale alongside the first and most recent sale in that region
Select
    region,
    sale_date,
    sales_amount,
    first_value(sales_amount) over (
        partition by region
        order by sale_date
    ) AS first_sale,
    last_value(sales_amount) over (
        partition by region
        order by sale_date
        rows BETWEEN unbounded preceding and unbounded following
    ) as last_sale
from sales;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Important&lt;/strong&gt;: &lt;code&gt;LAST_VALUE()&lt;/code&gt; requires an explicit frame clause &lt;code&gt;(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)&lt;/code&gt;. Without it, the default frame ends at the current row; so &lt;code&gt;LAST_VALUE()&lt;/code&gt; would just return the current row's value.&lt;/p&gt;

&lt;h3&gt;
  
  
  Window Frames: ROWS vs RANGE
&lt;/h3&gt;

&lt;p&gt;The frame clause defines exactly which rows within the partition are included in the calculation relative to the current row.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-- Rolling 3-row average (current row + 2 preceding)
select
    sale_date,
    sales_amount,
    avg(sales_amount) over (
        order by sale_date
        rows between 2 preceding and current row
    ) as rolling_3_avg
from sales;

-- Rolling 7-day average (based on value range, not row count)
select
    sale_date,
    sales_amount,
    avg(sales_amount) over (
        order by sale_date
        range between interval '6 days' preceding and current row
    ) as rolling_7day_avg
from sales;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ROWS vs RANGE:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ROWS&lt;/strong&gt;: physical row count&lt;br&gt;
&lt;strong&gt;RANGE&lt;/strong&gt;: value-based range (treats tied ORDER BY values as the same position)&lt;/p&gt;

&lt;h2&gt;
  
  
  Concept in plain terms
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;JOIN&lt;/strong&gt; is the act of stitching tables together. A relational database stores data in separate tables deliberately; orders in one table, customers in another, to avoid redundancy. JOINs are how you reassemble that normalised data into the shape a query needs. The join type controls what happens when no match exists: INNER drops the row, OUTER preserves it with NULLs.&lt;br&gt;
&lt;strong&gt;Window functions&lt;/strong&gt; are the answer to "I need an aggregate, but I don't want to lose the detail rows." The OVER() clause tells the database: compute this value, but do it within a sliding context tied to each row. The result is a hybrid row-level detail that coexists with group-level calculations in the same output. This makes complex analytics expressible in a single query that would otherwise require multiple subqueries, self-joins, or application-level processing.&lt;br&gt;
The two features address different problems. JOINs control the shape and completeness of the dataset. Window functions compute derived metrics across that dataset without changing its grain. Used together, they handle a wide range of analytical query patterns efficiently and without procedural code.&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>data</category>
      <category>database</category>
      <category>sql</category>
    </item>
    <item>
      <title>How Analysts Translate Messy Data, DAX, and Dashboards into Action Using Power BI</title>
      <dc:creator>Dishon Gatambia (Dd)</dc:creator>
      <pubDate>Sun, 15 Feb 2026 12:03:45 +0000</pubDate>
      <link>https://dev.to/dishon_gatambiadd_31a1/how-analysts-translate-messy-data-dax-and-dashboards-into-action-using-power-bi-1l3</link>
      <guid>https://dev.to/dishon_gatambiadd_31a1/how-analysts-translate-messy-data-dax-and-dashboards-into-action-using-power-bi-1l3</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Data analysts utilise Power BI to convert fragmented, unorganised data into strategic business intelligence by managing a complete workflow of preparation, calculation, and visual communication. &lt;/p&gt;

&lt;h2&gt;
  
  
  Data Ingestion and Transformation
&lt;/h2&gt;

&lt;p&gt;Real-world data is typically disorganised, arriving with inconsistent formatting, missing values, and structural issues across various sources like ERP databases, spreadsheets, and APIs. Analysts use Power Query and the M language to build reproducible ETL (Extract, Transform, Load) pipelines. This process includes: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Structural Normalisation&lt;/strong&gt;: Tasks such as unpivoting tables, merging queries, and handling null values prepare data without needing database administrator assistance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema Enforcement&lt;/strong&gt;: Converting data types (e.g., text to numeric) and standardising date formats to ISO 8601 prevents errors in later calculations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Relationship Modelling&lt;/strong&gt;: Analysts organise data into star schemas, where fact tables containing quantitative data connect to descriptive dimension tables via defined cardinalities&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  DAX: The Analytical Layer
&lt;/h2&gt;

&lt;p&gt;Data Analysis Expressions (DAX) allow analysts to move beyond basic table structures to perform dynamic calculations.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Calculated Columns vs. Measures&lt;/strong&gt;: Calculated columns are pre-computed and stored in memory for row-level categorisation.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Customer Segment = 
SWITCH(
    TRUE(),
    Sales[Total Amount] &amp;gt; 10000, "Enterprise",
    Sales[Total Amount] &amp;gt; 1000, "Mid-Market",
    "SMB"
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In contrast, measures are dynamic aggregations that compute only when filtered by visuals or slicers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Total Revenue = 
SUMX(
    Sales,
    Sales[Quantity] * Sales[Unit Price]
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context Manipulation&lt;/strong&gt;: The CALCULATE function is used to override or modify existing filters, enabling advanced comparisons like Year-over-Year (YoY) growth or high-value customer identification.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Revenue Previous Year = 
CALCULATE(
    [Total Revenue],
    DATEADD(Calendar[Date], -1, YEAR)
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;FILTER creates row contexts for granular conditional aggregation&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;High Value Customers Revenue = 
CALCULATE(
    [Total Revenue],
    FILTER(
        Customer,
        [Total Revenue] &amp;gt; 5000
    )
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time Intelligence&lt;/strong&gt;: Specialised functions allow for calculations like Year-to-Date (YTD) revenue, provided a contiguous calendar table is present.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;YTD Revenue = 
TOTALYTD(
    [Total Revenue],
    Calendar[Date]
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Dashboard Construction and Interaction
&lt;/h2&gt;

&lt;p&gt;Dashboards serve as the interface that reduces cognitive load for stakeholders. Effective design relies on choosing the correct visual for the data type:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Visual Selection:&lt;/strong&gt; Bar charts are used for category rankings, line charts for temporal trends, and matrices for hierarchical drill-downs. Single KPI values are highlighted using card visuals with variance indicators.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interactivity&lt;/strong&gt;: Analysts configure how visuals interact—through cross-filtering or highlighting—and use bookmarks or parameters to enable "what-if" analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Translation to Action
&lt;/h2&gt;

&lt;p&gt;The final stage of the analytical workflow is converting observations into organisational impact.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pattern Recognition&lt;/strong&gt;: Analysts identify critical deviations, such as revenue falling below forecasts or elevated churn rates in specific segments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drill-Down Capabilities&lt;/strong&gt;: Tooltips and drill-through features allow users to investigate the raw transactions behind aggregate numbers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decision Support&lt;/strong&gt;: By sharing these insights through the Power BI Service or embedded reports, analysts provide decision-makers with the evidence needed to optimise resources, reduce costs, or expand markets.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Power BI empowers analysts to transform messy data into actionable intelligence through a seamless workflow of cleaning, calculating, visualising, and sharing. By mastering Power Query, DAX, and dashboard design, analysts bridge the gap between data and decision-making, driving organisational success&lt;/p&gt;

</description>
    </item>
    <item>
      <title>HOSPITAL &amp; PHARMACY DATA ANALYSIS</title>
      <dc:creator>Dishon Gatambia (Dd)</dc:creator>
      <pubDate>Sun, 15 Feb 2026 10:58:29 +0000</pubDate>
      <link>https://dev.to/dishon_gatambiadd_31a1/hospital-pharmacy-data-analysis-4eoi</link>
      <guid>https://dev.to/dishon_gatambiadd_31a1/hospital-pharmacy-data-analysis-4eoi</guid>
      <description>&lt;h2&gt;
  
  
  Data Cleaning
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Set all the columns with the correct data types eg: ID columns set to text, dates set to date, numbers set to whole number, texts set to text. Unit price and cost set to fixed decimal number. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Trimmed&lt;/strong&gt; and &lt;strong&gt;cleaned&lt;/strong&gt; all the columns&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;There were no missing values and errors to correct&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Removed duplicates if any for the &lt;em&gt;patient ID, Visit ID &amp;amp; Transaction ID&lt;/em&gt; in the &lt;em&gt;Patients, Visits &amp;amp; Pharmacy Transactions&lt;/em&gt; respectively &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Capitalised&lt;/strong&gt; each word in cells&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Parsed through the data to check for negative values like age, quantity &amp;amp; cost. None found&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Added a new column, &lt;em&gt;Calculated total cost&lt;/em&gt;=&lt;em&gt;(Unit price) * (Quantity)&lt;/em&gt;. Parsed through and saw that there were no differences with the given &lt;em&gt;total cost&lt;/em&gt;, so deleted the new column&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Data Modelling
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;For this dataset, my &lt;strong&gt;fact tables&lt;/strong&gt; are: &lt;em&gt;Visits and Pharmacy transactions&lt;/em&gt; as they contain quantitative data while my &lt;strong&gt;dimension table&lt;/strong&gt; is &lt;em&gt;patients&lt;/em&gt; as it contains descriptive data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;strong&gt;unique identifiers&lt;/strong&gt; in &lt;em&gt;patients&lt;/em&gt; table is &lt;em&gt;patientID&lt;/em&gt; for &lt;em&gt;visits&lt;/em&gt; is &lt;em&gt;visitID&lt;/em&gt; for &lt;em&gt;pharmacy transactions&lt;/em&gt; is &lt;em&gt;transactionID&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To create the &lt;strong&gt;relationships&lt;/strong&gt; link patientID in &lt;em&gt;patients&lt;/em&gt; and &lt;em&gt;visits&lt;/em&gt; tables. This relationship is many-to-one from &lt;em&gt;visits _to _patients&lt;/em&gt; table or vice versa. Reveals that one patient can have many visits in the hospital. For &lt;em&gt;pharmacy transactions&lt;/em&gt; to &lt;em&gt;visits&lt;/em&gt;, the connecting column is &lt;em&gt;visitID&lt;/em&gt; and it's many-to-one relationship or vice versa. This reveals that one visit can have many transactions but not vice versa.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Created another table called &lt;em&gt;DimDrug&lt;/em&gt; that contains &lt;em&gt;Drug Name, Drug Category &amp;amp; Drug ID&lt;/em&gt;. Referenced the &lt;em&gt;pharmacy transactions&lt;/em&gt; table and went ahead to delete the other columns and remained with &lt;em&gt;Drug Name &amp;amp; Drug Category&lt;/em&gt;. Added an indexed column and formatted a prefix "DRG" to come up with the column &lt;em&gt;Drug ID&lt;/em&gt; eg "DRG 1". Removed duplicates hence remained with six drug names and drug categories. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Created another table &lt;em&gt;DimDate&lt;/em&gt; from refernced table &lt;em&gt;Visits&lt;/em&gt;. Deleted all columns and remained with the &lt;em&gt;visit date&lt;/em&gt; column. I added columns from &lt;strong&gt;date &amp;amp; time&lt;/strong&gt; to return columns &lt;em&gt;month, month name, quatre &amp;amp; year&lt;/em&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Created 2 tables from the &lt;em&gt;visits&lt;/em&gt; table namely &lt;em&gt;FacVisits &amp;amp; DimVisits&lt;/em&gt;. &lt;em&gt;FacVisits&lt;/em&gt; is a fact table containing &lt;em&gt;patientID, VisitID, Visit date &amp;amp; Length of stay days&lt;/em&gt; these are measurable data hence fact table. The &lt;em&gt;DimVisits&lt;/em&gt; table contains &lt;em&gt;VisitID, Diagnosis &amp;amp; Department&lt;/em&gt;, these are descriptive data hence dimension table&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;In the &lt;em&gt;Pharmacy Transactions&lt;/em&gt; table, added &lt;em&gt;Drug ID&lt;/em&gt; by adding a &lt;strong&gt;conditional column&lt;/strong&gt; and using an &lt;strong&gt;IF &amp;amp; Else&lt;/strong&gt; statement that returns the correct drug ID paired to the drug name from the &lt;em&gt;DimDrug&lt;/em&gt; table. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Merged queries using &lt;em&gt;visitID&lt;/em&gt; both in &lt;em&gt;Pharmacy Transactions &amp;amp; Visits&lt;/em&gt; table using &lt;strong&gt;Left outer merge&lt;/strong&gt;. Added &lt;em&gt;Patient_ID&lt;/em&gt; from &lt;em&gt;Visits&lt;/em&gt; table into &lt;em&gt;Pharmacy Transactions&lt;/em&gt;. From &lt;em&gt;Ptaient&lt;/em&gt; to &lt;em&gt;Pharmacy transactions&lt;/em&gt;, &lt;em&gt;Patient ID&lt;/em&gt; used as one to many query.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For the relationships:&lt;br&gt;
&lt;em&gt;(Visit_ID)Visit table&lt;/em&gt; - &lt;em&gt;(Visit_ID)Pharmacy transactions&lt;/em&gt; = one-many&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;(Visit_ID)FacVisits&lt;/em&gt; - &lt;em&gt;(Visit_ID)Pharmacy transactions&lt;/em&gt; = one-many&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;(Visit_ID)DimVisits&lt;/em&gt; - &lt;em&gt;(Visit_ID)Pharmacy transactions&lt;/em&gt; = one-many&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;(Visit_ID)DimDate&lt;/em&gt; - &lt;em&gt;(Visit_ID)Pharmacy transactions&lt;/em&gt; = one-many&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;(Drug_ID)DimDrug&lt;/em&gt; - &lt;em&gt;(Drug_ID)Pharmacy transactions&lt;/em&gt; = one-many&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;(Visit_ID)Visit table&lt;/em&gt; - &lt;em&gt;(Visit_ID)Pharmacy transactions&lt;/em&gt; = one-many&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;(Patient_ID)Patient&lt;/em&gt; - &lt;em&gt;(Patient_ID)Pharmacy transactions&lt;/em&gt; = one-many&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(Visit_ID)Visit table_ - &lt;em&gt;(Visit_ID)FacVisists&lt;/em&gt; = one-one&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(Visit_ID)Visit table_ - &lt;em&gt;(Visit_ID)DimDate&lt;/em&gt; = one-one&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;(Visit_ID)Visit table_ - &lt;em&gt;(Visit_ID)DimVisits&lt;/em&gt; = one-one&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Data Analysis
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;To visualise the diseases most common across counties&lt;/strong&gt;, used a &lt;strong&gt;matrix table&lt;/strong&gt;. The rows had the &lt;em&gt;county&lt;/em&gt; from the &lt;em&gt;patients&lt;/em&gt; table, and the columns were the &lt;em&gt;diseases&lt;/em&gt; from the &lt;em&gt;visits&lt;/em&gt; table. The value field was the count of &lt;em&gt;visitID&lt;/em&gt;. Formatted the cells to show red colour for the max values, white colour for middle and blue colour for the minimum values. From the chart, &lt;strong&gt;Typhoid &amp;amp; Diabetes are common in Kiambu, Hypertension is most common in Kisumu, Diabetes is most common in Mombasa, Pneumonia is most common in the capital city and the flu is most common in Nakuru and Uasin Gishu.&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;To visualise which departments generate a higher pharmacy revenue&lt;/strong&gt; - used a pie chart with the value being the &lt;strong&gt;total cost&lt;/strong&gt; &amp;amp; the legend being the &lt;strong&gt;department&lt;/strong&gt;. From the charts, &lt;strong&gt;Inpatient&amp;gt;&amp;gt;&amp;gt;Emergeny&amp;gt;&amp;gt;&amp;gt;Outpatient&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;To visualise which age groups consume the most drugs&lt;/strong&gt; - used a stacked bar chart. Used a &lt;strong&gt;Switch&lt;/strong&gt; function to come up with a column that will group ages into:&lt;br&gt;
&amp;lt;1yr - Infant&lt;br&gt;
1-14yrs - Child&lt;br&gt;
15-44yrs - Young adult&lt;br&gt;
45-59yrs - Middle Age&lt;br&gt;
60-74yrs - Elderly&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;75yrs - Senior&lt;br&gt;
ie&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Age_Group = SWITCH(
    TRUE(),
    [Age] &amp;lt; 1, "Infant",
    [Age] &amp;gt;= 1 &amp;amp;&amp;amp; [Age] &amp;lt;= 14, "Child",
    [Age] &amp;gt;= 15 &amp;amp;&amp;amp; [Age] &amp;lt;= 44, "Young Adult",
    [Age] &amp;gt;= 45 &amp;amp;&amp;amp; [Age] &amp;lt;= 59, "Middle-age",
    [Age] &amp;gt;= 60 &amp;amp;&amp;amp; [Age] &amp;lt;= 74, "Elderly",
    [Age] &amp;gt;= 75, "Senior",
    "Unknown")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Used a stacked bar chart of age group vs the drug count. &lt;br&gt;
From the chart, *&lt;em&gt;young adults use more drugs compared to other age groups *&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;To visualise if a high number of patients always leads to higher pharmacy revenue,&lt;/strong&gt; I used a combo chart. Came up with a new measure &lt;code&gt;Total Visits=Total Visits = COUNTROWS(Visits)&lt;/code&gt;. This is the sum of visits made to the facility. The columns were the total cost and the line was the total visits and the x-axis was the months. For statistical correctness, plotted a scatter plot where x-axis was total visits, y-axis was total cost and value was department. From the charts, &lt;strong&gt;we can conclude that high numbers of patients leads to a higher pharmacy revenue, evidenced by the peak months of July &amp;amp; August. The scatter plot has a positive gradient, confirming the inference.&lt;/strong&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;To visualise if some diagnosis are associated with longer hospital stay but less pharmacy spending&lt;/strong&gt;, plotted a combo chart for this data. The x-axis was diagnosis, the column was the total cost, and the line was the total length of stay. From the chart, &lt;strong&gt;typhoid is associated with longer hospital stay and higher pharmacy spending, while the flu is associated with a longer hospital stay but little pharmacy spending&lt;/strong&gt;.   &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Dashboard
&lt;/h2&gt;

&lt;p&gt;KPI cards were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Total visits - &lt;em&gt;Total visits&lt;/em&gt; measure ie &lt;code&gt;Total Visits = COUNTROWS(Visits)&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Total Pharmacy revenue - Total revenue measure ie &lt;code&gt;Total Revenue = sum(Pharmacy_Transactions[Total_Cost])&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Average days spent - Average days stayed ie &lt;code&gt;Average days stayed = AVERAGE(Visits[Length_of_Stay_Days])&lt;/code&gt;&lt;br&gt;
Plotted chars showing: &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Diseases trend over time - A line chart where x-axis is quarter y axis is count of diagnosis and legend is diagnosis&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Pharmacy cost breakdown by category - A stacked bar chart where x axis is diagnosis and y axis is total revenue&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;County and Department comparisons - A stacked bar chart where x axis is county and y axis is count of visits and the legend is departments. &lt;br&gt;
Slicers for county, department and visit date &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>analytics</category>
      <category>data</category>
      <category>datascience</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>SCHEMAS AND DATA MODELLING - POWER BI</title>
      <dc:creator>Dishon Gatambia (Dd)</dc:creator>
      <pubDate>Tue, 03 Feb 2026 14:45:34 +0000</pubDate>
      <link>https://dev.to/dishon_gatambiadd_31a1/schemas-and-data-modelling-power-bi-551g</link>
      <guid>https://dev.to/dishon_gatambiadd_31a1/schemas-and-data-modelling-power-bi-551g</guid>
      <description>&lt;p&gt;Data modelling is a foundational aspect of Power BI that determines how data is structured, related, and queried for analysis. Data modelling defines the structural relationship between tables to ensure query performance and reporting accuracy. This article explores key concepts like star and snowflake schemas, fact and dimension tables, relationships, and why prioritizing good design is essential for optimal performance and reliable reporting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Star Schema
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;star schema&lt;/strong&gt; is the standard for Power BI. It utilises a central fact table connected to multiple dimension tables.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Fact Tables&lt;/em&gt;: Contain quantitative metrics (e.g., sales, temperature) and foreign keys. They define the model's granularity.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Dimension Tables&lt;/em&gt;: Contain descriptive attributes such as business attributes (e.g., product names, dates). They provide the context for filtering and grouping.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Relationships in a star schema are typically one-to-many, flowing from the dimension table to the fact table. This configuration minimizes query complexity. Usually the fact tables represent the "many" while dimension tables represent the "one" aspect in the relationships. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk1da822l3awcphbhfxld.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk1da822l3awcphbhfxld.png" alt="An illustration of the star schema showing a central fact table surrounded by multiple dimension tables" width="800" height="546"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Snowflake Schema
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;snowflake schema&lt;/strong&gt; normalises dimension tables into sub-dimensions (e.g., a product dimension split into separate tables for category, subcategory, and product details). While this reduces data redundancy at the source, it is inefficient for Power BI. This is because it results in more tables, longer filter propagation chains, increased model complexity, and poorer performance due to additional joins. &lt;br&gt;
Snowflake schemas may be useful in specific scenarios (e.g., when source data is heavily normalised or storage is a major constraint)&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Performance Impact&lt;/em&gt;: Increased table counts and longer filter propagation chains degrade speed.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Recommendation&lt;/em&gt;: Denormalise data into single-dimension tables to simplify the semantic model and improve usability.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7cppsf0qku2etfgsfy0o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7cppsf0qku2etfgsfy0o.png" alt="Illustration of the Snowflake schema" width="318" height="159"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Fact and Dimension Tables
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fact tables&lt;/strong&gt; contain measurable data (e.g., sales orders, inventory levels) with foreign keys linking to dimensions and numeric columns for aggregation. They grow over time and define the model's granularity—ensuring consistency is critical to avoid inaccurate summaries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dimension tables&lt;/strong&gt; hold descriptive attributes (e.g., product names, customer details, dates) with a unique key (often a surrogate key for handling changes like slowly changing dimensions). They are smaller and support hierarchies for drilling down in reports&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Relationships in Power Bi
&lt;/h2&gt;

&lt;p&gt;Relationships define how tables connect and how filters propagate. &lt;br&gt;
&lt;em&gt;Directionality&lt;/em&gt;: Single-direction or active relationships filters are preferred. Bi-directional and many-to-many relationships should be limited as they increase logic complexity and slow queries&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Active vs. Inactive&lt;/em&gt;: Active relationships are the default path for filter propagation. Role-playing dimensions (e.g., multiple date types) should be handled via separate tables rather than complex inactive relationship chains.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Good Modelling Is Critical
&lt;/h2&gt;

&lt;p&gt;Poor data modelling leads to bloated models, slow report rendering, inaccurate results (e.g., from inconsistent granularity), and a confusing user experience. A well-designed star schema optimises compression, reduces query complexity, and scales better for large datasets.&lt;br&gt;
In contrast, flat tables or excessive snowflaking increase redundancy or joins, hurting refresh times and report interactivity. Best practices include using Power Query for transformations, surrogate keys, and avoiding unnecessary columns to keep models lean.&lt;br&gt;
In summary, prioritising star schema principles in Power BI data modelling delivers faster, more reliable insights—making it a cornerstone for any serious BI implementation. &lt;/p&gt;

</description>
      <category>analytics</category>
      <category>data</category>
      <category>dataengineering</category>
      <category>microsoft</category>
    </item>
    <item>
      <title>Getting Started with Git &amp; Github</title>
      <dc:creator>Dishon Gatambia (Dd)</dc:creator>
      <pubDate>Sun, 25 Jan 2026 19:17:09 +0000</pubDate>
      <link>https://dev.to/dishon_gatambiadd_31a1/getting-started-with-git-github-3kcc</link>
      <guid>https://dev.to/dishon_gatambiadd_31a1/getting-started-with-git-github-3kcc</guid>
      <description>&lt;h1&gt;
  
  
  &lt;strong&gt;What is Git?&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Git is a distributed version control software system that is capable of managing versions of source code or data. It is often used to control source code by programmers who are developing software collaboratively.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Why is version control important&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Version control (also known as source control or revision control) is a system that records and manages changes to a file or set of files, most commonly source code, over time&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The importance of version control is listed below: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Safety: you can undo mistakes without losing work.&lt;/li&gt;
&lt;li&gt;Collaboration: multiple people can work on the same codebase without overwriting each other.&lt;/li&gt;
&lt;li&gt;Accountability: history shows who changed what and why (commit messages).&lt;/li&gt;
&lt;li&gt;Experimentation: branches let you try features without risking the main code.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Pulling code from Github (step by step guide)&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Code retrieval from GitHub is categorized into two primary operations: Initial Acquisition (Clone) and Synchronization (Pull)&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Clone:
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Used when the code doesn't exist on the local machine&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Identify Repository URL:&lt;/strong&gt; On the GitHub repository page, click the Code button and copy the URL (HTTPS or SSH).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Initialize Download:&lt;/strong&gt; Open the terminal and execute: &lt;code&gt;git clone &amp;lt;repository-url&amp;gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Git creates a directory named after the repository, downloads all files, branches, and the full commit history, and configures the remote reference (origin).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Pull/Synchronization
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Used to update an existing local repository with changes from Github&lt;/em&gt;&lt;br&gt;
&lt;strong&gt;- Navigate to Directory:&lt;/strong&gt; Enter the local project folder.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Execute Update:&lt;/strong&gt; Run the following command: &lt;code&gt;git pull origin &amp;lt;branch-name&amp;gt;&lt;/code&gt;  Example: &lt;code&gt;git pull origin main&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Mechanism:&lt;/strong&gt; This command executes two sub-operations:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fetch:&lt;/strong&gt; Downloads remote data without altering local files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Merge:&lt;/strong&gt; Integrates remote changes into the current active branch.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Pushing code to Github (step-by-step guide)&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Pushing code to GitHub requires a staged sequence: preparing files, creating a local snapshot, and transmitting data to the remote server.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Initial Upload (New Repository)
&lt;/h3&gt;

&lt;p&gt;Use this sequence to link a local project to a newly created GitHub repository.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Create Remote Repository:&lt;/strong&gt; On GitHub, create a new repository. Do not initialize with a README, .gitignore, or license if code already exists locally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Initialize Local Git:&lt;/strong&gt; Navigate to the project root and execute: &lt;code&gt;git init&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Stage Files:&lt;/strong&gt; Add all files to the staging area: &lt;code&gt;git add.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Create First Commit:&lt;/strong&gt; Record the snapshot: &lt;code&gt;git commit -m "initial commit"&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Define Main Branch:&lt;/strong&gt; Ensure the primary branch is named &lt;code&gt;main: git branch -M main&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Link Remote URL:&lt;/strong&gt; Connect the local repository to GitHub: &lt;code&gt;git remote add origin &amp;lt;github-repo-url&amp;gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Execute Push:&lt;/strong&gt; Upload the code and set the upstream reference: &lt;code&gt;git push -u origin main&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  2.  Standard Synchronization (Existing Repository)
&lt;/h3&gt;

&lt;p&gt;Use this sequence for ongoing updates to a repository that is already linked.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage Specific Changes:&lt;/strong&gt; &lt;code&gt;git add &amp;lt;file-name&amp;gt;&lt;/code&gt; (or &lt;code&gt;git add.&lt;/code&gt; for all changes)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Commit:&lt;/strong&gt; &lt;code&gt;git commit -m "description of changes"&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Push:&lt;/strong&gt; &lt;code&gt;git push&lt;/code&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Git Tracking (step-by-step guide)
&lt;/h1&gt;

&lt;p&gt;Git tracks changes by managing data between three logical states: &lt;strong&gt;the Working Directory (unsaved changes),&lt;/strong&gt; &lt;strong&gt;the Staging Area (prepared changes),&lt;/strong&gt; and the &lt;strong&gt;Local Repository (permanent history).&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Initialization
&lt;/h3&gt;

&lt;p&gt;Activate Git tracking in a project directory: &lt;code&gt;git init&lt;/code&gt; This creates a &lt;code&gt;.git&lt;/code&gt; subdirectory to store metadata and object databases.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. State Verification
&lt;/h3&gt;

&lt;p&gt;Determine the current status of files: &lt;code&gt;git status&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Untracked:&lt;/strong&gt; New files unknown to Git.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Modified:&lt;/strong&gt; Tracked files with unsaved changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Staged:&lt;/strong&gt; Changes moved to the Index, ready for the next snapshot.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Change Preparation (Staging)
&lt;/h3&gt;

&lt;p&gt;Select specific changes for inclusion in the next version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Single file: &lt;code&gt;git add &amp;lt;file-name&amp;gt;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;All changes: &lt;code&gt;git add .&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Interactive (partial file): &lt;code&gt;git add -p&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Version Finalization (Commit)
&lt;/h3&gt;

&lt;p&gt;Record the staged changes as a permanent snapshot: &lt;code&gt;git commit -m "Direct description of change"&lt;/code&gt; &lt;em&gt;Standard format:&lt;/em&gt; Use imperative mood (e.g., "Fix logic error" rather than "Fixed logic error").&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Change Analysis Tools
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Real-time Differences
&lt;/h4&gt;

&lt;p&gt;Analyze modifications before staging or committing:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Unstaged changes:&lt;/strong&gt; &lt;code&gt;git diff&lt;/code&gt; (Compares working directory to staging area)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Staged changes:&lt;/strong&gt; &lt;code&gt;git diff --staged&lt;/code&gt; (Compares staging area to last commit)&lt;/p&gt;

&lt;h4&gt;
  
  
  Historical Review
&lt;/h4&gt;

&lt;p&gt;Inspect the chronological record of changes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Summary list:&lt;/strong&gt; &lt;code&gt;git log --oneline&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- Detailed patches:&lt;/strong&gt; &lt;code&gt;git log -p&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;- File-specific history:&lt;/strong&gt; &lt;code&gt;git log -- &amp;lt;file-path&amp;gt;&lt;/code&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Line-Level Attribution
&lt;/h4&gt;

&lt;p&gt;Identify when and by whom specific lines were altered: &lt;code&gt;git blame &amp;lt;file-name&amp;gt;&lt;/code&gt;&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>git</category>
      <category>github</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to Excel in Data Analytics: A Beginner's Guide - MS Excel for Data Analytics</title>
      <dc:creator>Dishon Gatambia (Dd)</dc:creator>
      <pubDate>Sun, 25 Jan 2026 19:13:22 +0000</pubDate>
      <link>https://dev.to/dishon_gatambiadd_31a1/how-to-excel-in-data-analytics-a-beginners-guide-ms-excel-for-data-analytics-3jl2</link>
      <guid>https://dev.to/dishon_gatambiadd_31a1/how-to-excel-in-data-analytics-a-beginners-guide-ms-excel-for-data-analytics-3jl2</guid>
      <description>&lt;p&gt;MS Excel is one of the most beginner-friendly tools for data analysis. It allows users to manipulate, process, and visualise large datasets efficiently, turning raw data into meaningful insights.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Why MS Excel for data analytics
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Useful in many entry-level analysis tasks eg: formulas, tables, conditional formatting, charts, pivottables etc&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;It's fast and easy&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Widely available in many institutions, workplaces and schools.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What we will learn in this article
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Data cleaning: Removing duplicates, sorting &amp;amp; filtering&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Simple formulas &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Conditional formatting&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data Visualisation: Tables, Charts, Pivot tables&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Data Cleaning
&lt;/h2&gt;

&lt;p&gt;Before data analysis commences, it's important to clean and organize the dataset to ensure accuracy. The common cleaning tasks are: &lt;/p&gt;

&lt;h3&gt;
  
  
  Removing duplicates
&lt;/h3&gt;

&lt;p&gt;Use &lt;strong&gt;Data &amp;gt; Remove duplicates&lt;/strong&gt; This removes redundancy in a dataset &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fndnyabuf3fne05kx5rs7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fndnyabuf3fne05kx5rs7.png" alt=" " width="793" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Sorting Data
&lt;/h3&gt;

&lt;p&gt;Sorting data makes it easier to immediately view and comprehend your data, organize and locate the facts you need, and ultimately help you make better decisions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgunm8ney9ybwf7v3o1v1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgunm8ney9ybwf7v3o1v1.png" alt=" " width="738" height="453"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Filtering Data
&lt;/h3&gt;

&lt;p&gt;This fn enables you to pull information from a given range that specifies the criteria. It shows the data that's only required. &lt;br&gt;
Select any column from the table. After that go to the data tab on the top of the ribbon and then in the sort and filters group select filter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fht4gtzcbae7uwcn6getf.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fht4gtzcbae7uwcn6getf.jpg" alt=" " width="450" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple Formulas
&lt;/h2&gt;

&lt;p&gt;Built in fns in Excel enable users perform calculations easily. Below are some of the examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;h3&gt;
  
  
  Sum
&lt;/h3&gt;

&lt;p&gt;Adds a range of numbers &lt;code&gt;=sum (a2:a877)&lt;/code&gt; This adds all values from cell a2 to a877&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;h3&gt;
  
  
  Average
&lt;/h3&gt;

&lt;p&gt;Calculates the averages of numbers &lt;code&gt;=average (a2:a877)&lt;/code&gt; This displays the average value of the numbers between cell a2 and cell a877&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;h3&gt;
  
  
  Count
&lt;/h3&gt;

&lt;p&gt;Counts how many cells display numbers. &lt;code&gt;=count (a2:a877)&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;h3&gt;
  
  
  CountA
&lt;/h3&gt;

&lt;p&gt;Counts cells that aren't empty &lt;code&gt;=countA(a2:a877)&lt;/code&gt;&lt;br&gt;
There are more formulas used in excel. Above formulas are just but a few of them&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conditional Formating
&lt;/h2&gt;

&lt;p&gt;Trends and patterns in a dataset can be highlighted for easier visualization.&lt;br&gt;
&lt;strong&gt;Step 1: Go to Home &amp;gt; Conditional Formatting.&lt;/strong&gt;&lt;br&gt;
Select any column from the table. After that go to the home tab on the top of the ribbon and then in the styles group select conditional formatting and then in the highlight cells rule select Greater than an option. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftzkdtrbzqgqgwndgju7n.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftzkdtrbzqgqgwndgju7n.jpeg" alt=" " width="800" height="369"&gt;&lt;/a&gt;&lt;br&gt;
Then a &lt;strong&gt;greater than&lt;/strong&gt; dialog box appears. First write the quarter value and then select the colour.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmq9s95rw9ndh9f7iasqc.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmq9s95rw9ndh9f7iasqc.jpeg" alt=" " width="800" height="369"&gt;&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Step 2: Preview Result&lt;/strong&gt;&lt;br&gt;
As you can see in the excel table &lt;strong&gt;'quarter' column changes the colour of the values that are greater than 6.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwo0dpuss92sa1v6ur9u1.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwo0dpuss92sa1v6ur9u1.jpeg" alt=" " width="800" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Charts
&lt;/h2&gt;

&lt;p&gt;Any set of info can be graphically represented using a chart. Excel offers several chart types of your choice. Charts make it easier to identify trends and relationships &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select your dataset and go to Insert &amp;gt; Charts.&lt;/li&gt;
&lt;li&gt;Choose from bar charts, line charts, or pie charts.&lt;/li&gt;
&lt;li&gt;Customize the chart for clarity and impact.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz70ttia4lf3ty7fjyv2h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz70ttia4lf3ty7fjyv2h.png" alt=" " width="479" height="553"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Pivot Tables
&lt;/h2&gt;

&lt;p&gt;Pivot tables much like charts help visualise datasets in graphical representation. The difference is pivot tables use charts for representation. So what are pivot tables? They summarize, analyze, and reorganise large datasets without formulas, allowing you to drag and drop headers to create interactive reports. &lt;br&gt;
&lt;strong&gt;Steps to create a Pivot Table:&lt;/strong&gt;&lt;br&gt;
Select your &lt;strong&gt;data&lt;/strong&gt;&lt;br&gt;
Go to &lt;strong&gt;Insert &amp;gt; PivotTable&lt;/strong&gt;&lt;br&gt;
Drag fields into &lt;strong&gt;Rows, Columns, and Values&lt;/strong&gt;&lt;br&gt;
Example:&lt;/p&gt;

&lt;p&gt;Rows &amp;gt; Department&lt;br&gt;
Values &amp;gt; Count of Expense Type&lt;br&gt;
The count of expense type will automatically be done by Excel.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9dvf74f8lx6izfms99aq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9dvf74f8lx6izfms99aq.png" alt=" " width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Dashboards
&lt;/h2&gt;

&lt;p&gt;Finally with all these tools and more we can now generate dashboards. Dashboards are paramount in data analysis. So what are dashboards? Like a car's dashboard that displays the combined data of a vehicle, dashboards consolidate key business metrics, charts and tables into a single view for easy analysis. What does a dashboard consist of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Charts&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Slicers (interactive filters)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Pivot tables&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo7jdk8wsedjjl2ph6554.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo7jdk8wsedjjl2ph6554.jpg" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>beginners</category>
      <category>data</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
