<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Charles</title>
    <description>The latest articles on DEV Community by Charles (@charles_ndungu).</description>
    <link>https://dev.to/charles_ndungu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3711276%2F8ca2a451-2bc3-4522-beb1-86d3f397b4b8.png</url>
      <title>DEV Community: Charles</title>
      <link>https://dev.to/charles_ndungu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/charles_ndungu"/>
    <language>en</language>
    <item>
      <title>Connecting PostgreSQL to Power BI for Data Analysis</title>
      <dc:creator>Charles</dc:creator>
      <pubDate>Tue, 14 Apr 2026 21:39:04 +0000</pubDate>
      <link>https://dev.to/charles_ndungu/connecting-postgresql-to-power-bi-for-data-analysis-3pnj</link>
      <guid>https://dev.to/charles_ndungu/connecting-postgresql-to-power-bi-for-data-analysis-3pnj</guid>
      <description>&lt;h3&gt;
  
  
  Introduction: Power BI and the Role of SQL Databases
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Power BI&lt;/strong&gt; is a tool from Microsoft that turns raw data into interactive dashboards and reports. You know those beautiful charts and graphs that managers love? Power BI makes them. But here’s the thing – Power BI on its own doesn’t store data. It needs to connect to something that does store data. That’s where databases come in.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;database&lt;/strong&gt; is just an organised collection of information. Think of it like a giant, super‑fast Excel file that many people can use at the same time. Databases are divided into two main types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SQL databases&lt;/strong&gt; (&lt;em&gt;also called relational databases&lt;/em&gt;) – like PostgreSQL and MySQL – use tables with rows and columns, and you talk to them using the SQL language."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NoSQL databases&lt;/strong&gt; (like MongoDB) – they store data in other formats like JSON documents.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Connecting Power BI to a local PostgreSQL database
&lt;/h3&gt;

&lt;p&gt;When we say a &lt;strong&gt;local&lt;/strong&gt; database, we simply mean the database is installed and running on your own computer – not on a server somewhere else. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you need before you start&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Power BI Desktop&lt;/strong&gt; – free download from Microsoft.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL&lt;/strong&gt; installed and running on your machine.&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;database&lt;/strong&gt; with some tables already created – for example, a simple sales database with tables like customers, products, sales, and inventory.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;PostgreSQL connection driver&lt;/strong&gt; – when you first try to connect from Power BI, it will usually detect that the driver is missing and offer to download it for you. Just follow the prompts and say &lt;strong&gt;Yes&lt;/strong&gt; or &lt;strong&gt;Install&lt;/strong&gt;. &lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step-by-step to connecting Power BI to a locally hosted Database
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Step 1.&lt;/strong&gt; &lt;strong&gt;Open Power BI Desktop&lt;/strong&gt;. You’ll see a blank canvas with a ribbon at the top.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F730yr7kx5qb07s713xrk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F730yr7kx5qb07s713xrk.png" alt="Blank Canvas" width="800" height="421"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2.&lt;/strong&gt; On the &lt;strong&gt;Home&lt;/strong&gt; ribbon, click &lt;strong&gt;Get Data&lt;/strong&gt; (it’s on the left). A dropdown appears – choose &lt;strong&gt;More&lt;/strong&gt;… if you don’t see PostgreSQL right away.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faxtcm1fm5olt25duidkg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faxtcm1fm5olt25duidkg.png" alt="Postgresql" width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3.&lt;/strong&gt; In the search box, type “PostgreSQL”. Select &lt;strong&gt;PostgreSQL Database&lt;/strong&gt; and click &lt;strong&gt;Connect&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyf3cq0smxihmxpqt17e9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyf3cq0smxihmxpqt17e9.png" alt="Selection" width="716" height="686"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4.&lt;/strong&gt; A small window asks for two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Server&lt;/strong&gt;: type &lt;code&gt;localhost&lt;/code&gt; (that’s the special name your computer uses for itself).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database&lt;/strong&gt;: type the name of your database that contains the data. If you’re not sure, check in pgAdmin – right‑click on the database name.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbx0blw1f9xez2vpdr33f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbx0blw1f9xez2vpdr33f.png" alt="ok" width="707" height="357"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5.&lt;/strong&gt; You can leave &lt;strong&gt;Data Connectivity mode&lt;/strong&gt; as &lt;strong&gt;Import&lt;/strong&gt; . Click &lt;strong&gt;OK&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 6.&lt;/strong&gt; Power BI will ask for your PostgreSQL &lt;strong&gt;username&lt;/strong&gt; and &lt;strong&gt;password&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Username is usually &lt;code&gt;postgres&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Password is whatever you set when you installed PostgreSQL.
Choose &lt;strong&gt;Database&lt;/strong&gt; as the authentication method, then click &lt;strong&gt;Connect&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe1lx1t5xhv8i1w6d7hbc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe1lx1t5xhv8i1w6d7hbc.png" alt="Name &amp;amp; Password" width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 7.&lt;/strong&gt; After a few seconds, the &lt;strong&gt;Navigator&lt;/strong&gt; window appears. It shows all the schemas and tables in your database. Select the tables you wish to work on and load the data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhh0l1i9xyxnrt5dquy3i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhh0l1i9xyxnrt5dquy3i.png" alt="Loading data" width="800" height="636"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 8.&lt;/strong&gt; You have two buttons at the bottom: Load and Transform Data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Load&lt;/strong&gt; brings the tables straight into Power BI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transform Data&lt;/strong&gt; opens the Power Query Editor where you can clean or reshape the data first.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fekxdmdi3vxht1bjznm49.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fekxdmdi3vxht1bjznm49.png" alt="Power Query" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 9.&lt;/strong&gt; Once the cleaning is done you can &lt;strong&gt;create a relationship&lt;/strong&gt; between the tables. Open the &lt;strong&gt;model view&lt;/strong&gt; and inspect the relationship between the tables.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyui4jy0bt7zh7jingt2k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyui4jy0bt7zh7jingt2k.png" alt="Connection" width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Connecting Power BI to a cloud PostgreSQL database (Aiven)
&lt;/h3&gt;

&lt;p&gt;In the real world, databases are rarely on your laptop. They live in the cloud – AWS, Google Cloud, or a service like &lt;strong&gt;Aiven&lt;/strong&gt;. Connecting to a cloud database is similar to local, but with one extra step: &lt;strong&gt;SSL certificates&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In a nutshell, what is SSL certificates?&lt;/strong&gt;&lt;br&gt;
Think of it this way – when you share data across the internet, you don't want some random person snooping on your passwords or your sales numbers. &lt;strong&gt;SSL&lt;/strong&gt; is just a technology that encrypts (secures) everything so only you and the database can read it. Cloud providers like &lt;strong&gt;Aiven&lt;/strong&gt; won't even let you connect without an &lt;strong&gt;SSL certificate&lt;/strong&gt;. It's a small file you download and install on your computer – it's like a &lt;strong&gt;digital ID badge&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1.&lt;/strong&gt; Download and install the certificate from Aiven&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Log into your &lt;strong&gt;Aiven&lt;/strong&gt; and go to your PostgreSQL service.&lt;/li&gt;
&lt;li&gt;On the &lt;strong&gt;Overview&lt;/strong&gt; tab, scroll down to &lt;strong&gt;Connection Information&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Click Download &lt;strong&gt;CA Certificate&lt;/strong&gt;. You’ll get a file named &lt;code&gt;ca.pem&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Rename it file from &lt;code&gt;ca.pem&lt;/code&gt; to &lt;code&gt;ca.crt&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Double‑click the &lt;code&gt;ca.crt&lt;/code&gt; file. A window pops up – click &lt;strong&gt;Install Certificate&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu2ipv3xmcqutim22yova.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu2ipv3xmcqutim22yova.png" alt="ca.crt" width="394" height="467"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Choose &lt;strong&gt;Local Machine&lt;/strong&gt; → Next.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkxhvoc095prr4vo44j4c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkxhvoc095prr4vo44j4c.png" alt="nct" width="500" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select &lt;em&gt;Place all certificates in the following store&lt;/em&gt; → &lt;strong&gt;Browse&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Choose &lt;strong&gt;Trusted Root Certification Authorities&lt;/strong&gt; → &lt;strong&gt;OK&lt;/strong&gt; → &lt;strong&gt;Next&lt;/strong&gt; → &lt;strong&gt;Finish&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0fyvb5itco2uebsbuyy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0fyvb5itco2uebsbuyy.png" alt="2 IN 1" width="529" height="557"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3z9l7ivclodhr0bgao3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3z9l7ivclodhr0bgao3.png" alt="Successful" width="218" height="138"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2.&lt;/strong&gt; Get your connection details&lt;/p&gt;

&lt;p&gt;Still in the Aiven Overview page, copy these values:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Host&lt;/strong&gt; (looks like &lt;code&gt;pg-1a2b3c4d-aivencloud.com&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Port&lt;/strong&gt; (a five‑digit number, e.g., &lt;code&gt;12345&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database name&lt;/strong&gt; (often &lt;code&gt;defaultdb&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Username&lt;/strong&gt; (usually &lt;code&gt;avnadmin&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Password&lt;/strong&gt; (the one you set or was generated)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6299t54zdatvqevctg84.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6299t54zdatvqevctg84.png" alt="Details" width="800" height="334"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3.&lt;/strong&gt; Connect from Power BI&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In Power BI, click &lt;strong&gt;Get Data&lt;/strong&gt; → &lt;strong&gt;PostgreSQL Database&lt;/strong&gt; (same as the local host connection).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvaa0ms33kow7nhku9zib.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvaa0ms33kow7nhku9zib.png" alt="Getting data" width="688" height="673"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In the &lt;strong&gt;Server&lt;/strong&gt; field, combine the host and port like this:
&lt;code&gt;pg-1a2b3c4d-aivencloud.com:12345&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkas5lvfzmyw5k4vxhuo0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkas5lvfzmyw5k4vxhuo0.png" alt="user" width="707" height="355"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Enter the &lt;strong&gt;Database&lt;/strong&gt; name (e.g., &lt;code&gt;defaultdb&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click &lt;strong&gt;OK&lt;/strong&gt;. Enter your Aiven username and password.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhygnapgfuapf0gc1n8g5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhygnapgfuapf0gc1n8g5.png" alt="user_name and password" width="800" height="430"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The Navigator appears – same as before. Select your tables and click Load or Transform Data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F26uxc84vdoil3xgooj76.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F26uxc84vdoil3xgooj76.png" alt="T &amp;amp; L" width="800" height="636"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Loading tables and building relationships
&lt;/h3&gt;

&lt;p&gt;Once your tables are loaded (either from local or cloud), you need to tell Power BI how they relate to each other.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The model view&lt;/strong&gt; Click the Model icon on the left sidebar – it looks like three connected boxes. You’ll see your four tables floating. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Power BI sometimes guesses relationships automatically. But here we are going to;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Create these relationships (drag and drop):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;customers (&lt;code&gt;customer_id&lt;/code&gt;) → sales (&lt;code&gt;customer_id&lt;/code&gt;) → One‑to‑Many&lt;/li&gt;
&lt;li&gt;products (&lt;code&gt;product_id&lt;/code&gt;) → sales (&lt;code&gt;product_id&lt;/code&gt;) → One‑to‑Many&lt;/li&gt;
&lt;li&gt;products (&lt;code&gt;product_id&lt;/code&gt;) → inventory (&lt;code&gt;product_id&lt;/code&gt;) → One‑to‑One&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkn4k8z5t5wugw2q5cxgg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkn4k8z5t5wugw2q5cxgg.png" alt="Star Schema" width="800" height="383"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now when you build a chart, you can drag &lt;code&gt;product_name&lt;/code&gt; from &lt;code&gt;products&lt;/code&gt; and &lt;code&gt;total_amount&lt;/code&gt; from &lt;code&gt;sales&lt;/code&gt; onto the same visual – Power BI automatically joins them.&lt;/p&gt;

&lt;p&gt;Without these relationships, filters won't work correctly.&lt;/p&gt;
&lt;h3&gt;
  
  
  What is data modeling?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Data modeling&lt;/strong&gt; is basically how you decide your tables relate to each other. You can think of it as setting up a &lt;strong&gt;blueprint&lt;/strong&gt; for your data. In Power BI, this is what makes everything work together — when you click on a chart, the rest of the report updates because those relationships are &lt;strong&gt;already defined&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Relationships Matter&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Relationships are what make your data useful. Without them, your tables just sit there as separate pieces of information.&lt;/p&gt;

&lt;p&gt;For example, a sales report won’t be able to show the &lt;code&gt;customer’s name&lt;/code&gt;, filtering by &lt;code&gt;product category&lt;/code&gt; won’t affect your &lt;code&gt;sales numbers&lt;/code&gt;, and in some cases, you might even end up double-counting values.&lt;/p&gt;

&lt;p&gt;That’s how wrong insights happen — not because the data is bad, but because the connections aren’t set properly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Types of relationships&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;What it means&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;One-to-Many&lt;/td&gt;
&lt;td&gt;One record in Table A links to many in Table B&lt;/td&gt;
&lt;td&gt;One customer, many sales&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;One-to-One&lt;/td&gt;
&lt;td&gt;One record links to exactly one other&lt;/td&gt;
&lt;td&gt;One product, one inventory record&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Many-to-Many&lt;/td&gt;
&lt;td&gt;Many records on both sides (use carefully)&lt;/td&gt;
&lt;td&gt;Multiple products in multiple orders&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Bottom line&lt;/strong&gt;: Power BI is the kitchen. SQL is the prep cook who chops and cleans the ingredients first. Do the heavy lifting in SQL, and your dashboards will be faster and simpler.&lt;/p&gt;
&lt;h3&gt;
  
  
  Why SQL skills still matter
&lt;/h3&gt;

&lt;p&gt;You might be thinking: &lt;em&gt;"If Power BI already has drag-and-drop features, why do I still need SQL?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The simple answer is this: Power BI is great at &lt;strong&gt;showing data&lt;/strong&gt;, but SQL is better at &lt;strong&gt;preparing data&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lets see some situations where SQL will be useful.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Loading Only What You Need&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of importing entire tables, you can write a SQL query in the &lt;strong&gt;Advanced Options&lt;/strong&gt; when connecting.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_spent&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, Power BI only loads a summarized dataset — which is much faster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Filter at the Source&lt;/strong&gt;&lt;br&gt;
Filtering data in SQL happens before the data is even sent to Power BI.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;sale_date&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This reduces the size of the dataset and improves refresh performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Use Advanced SQL Features&lt;/strong&gt;&lt;br&gt;
Some operations are easier in SQL than in Power BI.&lt;/p&gt;

&lt;p&gt;For example, functions like &lt;code&gt;RANK()&lt;/code&gt; or &lt;code&gt;LAG()&lt;/code&gt; are straightforward in SQL but can get complicated in DAX.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Cleaning your Data Prior&lt;/strong&gt;&lt;br&gt;
It’s always better to fix data issues at the source.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;UPPER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;TRIM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;first_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;COALESCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'no-email@example.com'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This helps you avoid unnecessary cleaning steps inside Power BI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;We walked through connecting PostgreSQL to Power BI – whether it's on your own laptop or up in the cloud with something like Aiven.&lt;/p&gt;

&lt;p&gt;What surprised me is that the connection part is actually the easiest step. The real work starts after that: loading the right tables, cleaning stuff up, getting the relationships right. If you skip that part, your charts will look weird and you'll be chasing problems forever.&lt;/p&gt;

&lt;p&gt;I was working with a pretty small dataset – just a few tables, nothing crazy. But even then, I could see the pattern. It's not about which tool you use. It's about whether you bothered to prepare your data properly before dragging things onto the canvas.&lt;/p&gt;

&lt;p&gt;Get the structure right, and Power BI feels almost too easy. Get it wrong, and you'll spend hours fixing filters and wondering why your totals don't match. I've been there. It's not fun.&lt;/p&gt;

&lt;p&gt;So yeah – take the extra time to model things correctly. You'll have fun and enjoy every bit.&lt;/p&gt;

</description>
      <category>database</category>
      <category>datascience</category>
      <category>postgres</category>
      <category>powerbi</category>
    </item>
    <item>
      <title>Machine Learning Basics: What is ML? Supervised vs Unsupervised, Features vs Labels</title>
      <dc:creator>Charles</dc:creator>
      <pubDate>Mon, 13 Apr 2026 12:09:25 +0000</pubDate>
      <link>https://dev.to/charles_ndungu/machine-learning-basics-what-is-ml-supervised-vs-unsupervised-features-vs-labels-2khf</link>
      <guid>https://dev.to/charles_ndungu/machine-learning-basics-what-is-ml-supervised-vs-unsupervised-features-vs-labels-2khf</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjyfltih3tuypi0l9yrwi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjyfltih3tuypi0l9yrwi.png" alt="ML" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  In this article, we will cover:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;What ML really is&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;The difference between &lt;strong&gt;supervised&lt;/strong&gt; and &lt;strong&gt;unsupervised&lt;/strong&gt; learning&lt;/li&gt;
&lt;li&gt;What &lt;strong&gt;features&lt;/strong&gt; and &lt;strong&gt;labels&lt;/strong&gt; are – and why they matter&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What is machine learning?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Machine learning (ML)&lt;/strong&gt; is the subset of artificial intelligence (AI) focused on algorithms that can &lt;em&gt;learn&lt;/em&gt; the patterns of training data and, subsequently, make accurate inferences about new data. This pattern recognition ability enables machine learning models to make decisions or predictions without explicit, hard-coded instructions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples of Machine Learning.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Personal assistants and voice assistants.&lt;/strong&gt;&lt;br&gt;
ML powers popular virtual assistants like Amazon Alexa and Apple Siri. It enables speech recognition, natural language processing (NLP), and text-to-speech conversion. When you ask a question, ML not only understands your intent but also searches for relevant answers or recalls similar past interactions for more personalized responses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Email Filtering and Management.&lt;/strong&gt;&lt;br&gt;
ML algorithms in Gmail automatically categorize emails into Primary, Social, and Promotions tabs while detecting and moving spam to the spam folder. Beyond basic rules, ML tools classify incoming emails, route them to the right team members, extract attachments, and enable automated personalized replies. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Transportation and Navigation.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Machine Learning has transformed modern transportation in several ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Google Maps uses ML to analyze real-time traffic conditions, calculate the fastest routes, suggest nearby places to explore, and provide accurate arrival time predictions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Ride-sharing apps like Uber and Bolt apply ML to match riders with drivers, dynamically set pricing (surge pricing), optimize routes based on live traffic, and predict accurate ETAs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Self-driving cars (e.g., Tesla) rely heavily on computer vision and unsupervised ML algorithms. These systems process data from cameras and sensors in real-time to understand their surroundings and make instant driving decisions.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Types of machine learning
&lt;/h2&gt;

&lt;p&gt;Machine Learning generally falls into two main learning paradigms: Supervised Learning and Unsupervised Learning. These differ based on the type of data they use and the objective they aim to achieve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Supervised Learning&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrm7udin7vxdy2ga31ye.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrm7udin7vxdy2ga31ye.png" alt="Supervised Learning" width="800" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Supervised learning trains a model using labeled data — where every input example is paired with the correct output (label). The goal is to learn the mapping between inputs and outputs so the model can accurately predict outcomes on new, unseen data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common Tasks:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Classification&lt;/strong&gt; — Predict discrete categories (e.g., spam/not spam, cat/dog, approve/reject loan)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Regression&lt;/strong&gt; — Predict continuous values (e.g., house price, temperature, sales forecast)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;br&gt;
In supervised learning, the model learns from examples where the answers are already known. It is given inputs (features) together with the correct outputs (labels), and over time it identifies patterns in the data. As it trains, it continuously adjusts itself to reduce the difference between its predictions and the actual answers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-world examples:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spam detection, &lt;/li&gt;
&lt;li&gt;Image classification, &lt;/li&gt;
&lt;li&gt;Credit risk scoring.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Analogy:&lt;/strong&gt;&lt;br&gt;
Think of a student learning with a teacher. The teacher shows examples and clearly labels them — “this is a cat,” “this is a dog.” Over time, the student begins to recognize the differences and can correctly identify new animals on their own.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Unsupervised Learning&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvubs01e7eam1rzlgm9v8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvubs01e7eam1rzlgm9v8.png" alt="Unsupervised Learning" width="736" height="522"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Unsupervised learning works with unlabeled data. The model must discover hidden patterns, structures, or groupings on its own — without any “correct answers” provided.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common tasks:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Clustering&lt;/strong&gt; — grouping similar data points together (e.g., customer segmentation)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Association&lt;/strong&gt; — finding relationships in data (e.g., people who buy X also buy Y)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Dimensionality reduction&lt;/strong&gt; — simplifying data while keeping the most important information&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Real-world examples:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Customer segmentation in retail (grouping shoppers based on buying habits),&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fraud detection in mobile money or banking (flagging unusual transactions),&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Product recommendations on e-commerce sites (suggesting items similar to what you’ve viewed),&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Music or movie suggestions based on what you like (Spotify, Prime Video).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Supervised vs Unsupervised Learning
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Supervised Learning&lt;/th&gt;
&lt;th&gt;Unsupervised Learning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data used&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Labeled (features + answers)&lt;/td&gt;
&lt;td&gt;Unlabeled (just features, no answers)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Goal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Predict an output / category&lt;/td&gt;
&lt;td&gt;Find hidden patterns or groupings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Task types&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Classification &amp;amp; regression&lt;/td&gt;
&lt;td&gt;Clustering, association, dimensionality reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;How hard to evaluate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Easy – you have ground truth to compare&lt;/td&gt;
&lt;td&gt;Trickier – no "right answer" to check against&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Real‑world examples&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Spam detection, price prediction&lt;/td&gt;
&lt;td&gt;Customer segments, fraud detection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Complexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Generally simpler&lt;/td&gt;
&lt;td&gt;More complex (no teacher to guide)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key Takeaway:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Use Supervised Learning when you have labeled historical data and want to make predictions. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use Unsupervised Learning when you have lots of raw data and want to discover insights or patterns you didn’t already know.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Modern systems often combine both. For example, many Large Language Models (LLMs) use self‑supervised learning during pre‑training, followed by supervised fine‑tuning and RLHF (reinforcement learning from human feedback).&lt;/p&gt;

&lt;h2&gt;
  
  
  Features vs Labels
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flncjv90aku2634j4yzvo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flncjv90aku2634j4yzvo.png" alt="F vs L" width="800" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're doing supervised learning, you'll run into two terms constantly: &lt;strong&gt;features&lt;/strong&gt; and &lt;strong&gt;labels&lt;/strong&gt;. Here's what they actually mean.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is a Feature?&lt;/strong&gt;&lt;br&gt;
A &lt;strong&gt;feature&lt;/strong&gt; is any piece of information you feed the model – a clue that helps it make a prediction. Features are also called &lt;em&gt;independent variables&lt;/em&gt;, &lt;em&gt;predictors&lt;/em&gt;, or &lt;em&gt;attributes&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples of Features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In house price prediction: square footage, number of bedrooms&lt;/li&gt;
&lt;li&gt;In spam detection: length of email, number of capital letters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Features&lt;/strong&gt; can be numerical (age, price), categorical (gender, color), or text-based.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is a Label?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;label&lt;/strong&gt; is the answer the model tries to guess – the output or correct answer. Also called &lt;em&gt;target&lt;/em&gt; or &lt;em&gt;dependent variable&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples of Labels:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;House price prediction --&amp;gt; Actual sale price (Kshs)&lt;/li&gt;
&lt;li&gt;Spam detection --&amp;gt; “Spam” or “Not Spam”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Labels&lt;/strong&gt; are only available in supervised learning because they represent the ground truth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Features vs Labels – Quick Comparison
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Features (the inputs)&lt;/th&gt;
&lt;th&gt;Label (the answer)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;What it is&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;What the model uses to learn&lt;/td&gt;
&lt;td&gt;What the model tries to guess&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Other names&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Independent variables, predictors&lt;/td&gt;
&lt;td&gt;Target variable, dependent variable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Do you always have it?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Yes – in any dataset&lt;/td&gt;
&lt;td&gt;Only in supervised learning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;House price example&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Size, bedrooms, location&lt;/td&gt;
&lt;td&gt;The price tag&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key Takeaway:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Features = clues. Label = the answer.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;When preparing data for a supervised model, split it into &lt;code&gt;X&lt;/code&gt; (features) and &lt;code&gt;y&lt;/code&gt; (label).
&lt;/li&gt;
&lt;li&gt;Garbage in --&amp;gt; garbage out: bad features or wrong labels will ruin your model.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Machine Learning&lt;/strong&gt; lets computers learn from data without hard‑coded rules.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Supervised learning&lt;/strong&gt; uses labeled data to predict outcomes (spam detection, prices).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unsupervised learning&lt;/strong&gt; finds hidden patterns in unlabeled data (customer segments, fraud).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Features&lt;/strong&gt; are the clues you feed the model. &lt;strong&gt;Labels&lt;/strong&gt; are the answers you want to predict.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>tutorial</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Data lives scattered across tables—customers here, products there, sales everywhere. This article is your guide to bringing it all together. Follow along with a real online store dataset as we build queries that rank top customers, spot sales trends, and u</title>
      <dc:creator>Charles</dc:creator>
      <pubDate>Sat, 07 Mar 2026 17:31:11 +0000</pubDate>
      <link>https://dev.to/charles_ndungu/data-lives-scattered-across-tables-customers-here-products-there-sales-everywhere-this-article-1j6g</link>
      <guid>https://dev.to/charles_ndungu/data-lives-scattered-across-tables-customers-here-products-there-sales-everywhere-this-article-1j6g</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/charles_ndungu" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3711276%2F8ca2a451-2bc3-4522-beb1-86d3f397b4b8.png" alt="charles_ndungu"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/charles_ndungu/mastering-sql-joins-and-window-functions-a-practical-guide-with-an-e-commerce-dataset-ejg" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Mastering SQL Joins and Window Functions: A Practical Guide with an E‑commerce Dataset&lt;/h2&gt;
      &lt;h3&gt;Charles ・ Mar 7&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#sql&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#database&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#softwaredevelopment&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#datascience&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>sql</category>
      <category>database</category>
      <category>softwaredevelopment</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Mastering SQL Joins and Window Functions: A Practical Guide with an E‑commerce Dataset</title>
      <dc:creator>Charles</dc:creator>
      <pubDate>Sat, 07 Mar 2026 17:23:41 +0000</pubDate>
      <link>https://dev.to/charles_ndungu/mastering-sql-joins-and-window-functions-a-practical-guide-with-an-e-commerce-dataset-ejg</link>
      <guid>https://dev.to/charles_ndungu/mastering-sql-joins-and-window-functions-a-practical-guide-with-an-e-commerce-dataset-ejg</guid>
      <description>&lt;h3&gt;
  
  
  Introduction &amp;amp; The Dataset
&lt;/h3&gt;

&lt;p&gt;If you're working with SQL, you've probably used basic &lt;code&gt;SELECT&lt;/code&gt; statements, filtered with &lt;code&gt;WHERE&lt;/code&gt;, and maybe even done some simple aggregations. But the real power of SQL unfolds when you start combining data from multiple tables and performing complex calculations across rows without losing detail. That's where &lt;strong&gt;JOINs&lt;/strong&gt; and &lt;strong&gt;Window Functions&lt;/strong&gt; come in.&lt;/p&gt;

&lt;p&gt;In this article, we'll dive deep into these advanced concepts using a realistic e‑commerce dataset. &lt;br&gt;
By the end, you'll be able to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Combine data from multiple tables using different types of joins.&lt;/li&gt;
&lt;li&gt;Perform rankings, running totals, and comparisons across rows with window functions.&lt;/li&gt;
&lt;li&gt;Organize complex queries using Common Table Expressions (CTEs).&lt;/li&gt;
&lt;li&gt;Solve practical business questions that real data analysts face every day.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We'll use a consistent dataset throughout – so you can follow along, try the queries yourself, and truly master these techniques.&lt;/p&gt;
&lt;h3&gt;
  
  
  Meet Our Data
&lt;/h3&gt;

&lt;p&gt;Imagine we're running an online store. Our database has three main tables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;customers&lt;/code&gt; – people who registered on our site.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;products&lt;/code&gt; – items we sell, with prices and stock.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sales&lt;/code&gt; – records of purchases, linking customers to products.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here are the table structures (simplified for clarity) and a few sample rows.&lt;/p&gt;
&lt;h4&gt;
  
  
  1. Customers
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100),
    phone_number VARCHAR(50),
    registration_date DATE,
    membership_status VARCHAR(10)  -- 'Bronze', 'Silver', 'Gold'
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;first_name&lt;/th&gt;
&lt;th&gt;last_name&lt;/th&gt;
&lt;th&gt;email&lt;/th&gt;
&lt;th&gt;phone_number&lt;/th&gt;
&lt;th&gt;registration_date&lt;/th&gt;
&lt;th&gt;membership_status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Karen&lt;/td&gt;
&lt;td&gt;Molina&lt;/td&gt;
&lt;td&gt;&lt;a href="mailto:gonzalezkimberly@glass.com"&gt;gonzalezkimberly@glass.com&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;(728)697-1206&lt;/td&gt;
&lt;td&gt;2020-08-27&lt;/td&gt;
&lt;td&gt;Bronze&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Elizabeth&lt;/td&gt;
&lt;td&gt;Archer&lt;/td&gt;
&lt;td&gt;&lt;a href="mailto:tramirez@gmail.com"&gt;tramirez@gmail.com&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;778.104.6553&lt;/td&gt;
&lt;td&gt;2023-08-28&lt;/td&gt;
&lt;td&gt;Silver&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Roberta&lt;/td&gt;
&lt;td&gt;Massey&lt;/td&gt;
&lt;td&gt;&lt;a href="mailto:davislori@gmail.com"&gt;davislori@gmail.com&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;+1-365-606-7458&lt;/td&gt;
&lt;td&gt;2024-06-12&lt;/td&gt;
&lt;td&gt;Bronze&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h4&gt;
  
  
  2. Products
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100),
    category VARCHAR(50),
    price DECIMAL(10, 2),
    supplier VARCHAR(100),
    stock_quantity INT
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;product_id&lt;/th&gt;
&lt;th&gt;product_name&lt;/th&gt;
&lt;th&gt;category&lt;/th&gt;
&lt;th&gt;price&lt;/th&gt;
&lt;th&gt;supplier&lt;/th&gt;
&lt;th&gt;stock_quantity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Laptop&lt;/td&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;999.99&lt;/td&gt;
&lt;td&gt;Dell&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Smartphone&lt;/td&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;799.99&lt;/td&gt;
&lt;td&gt;Samsung&lt;/td&gt;
&lt;td&gt;150&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Washing Machine&lt;/td&gt;
&lt;td&gt;Appliances&lt;/td&gt;
&lt;td&gt;499.99&lt;/td&gt;
&lt;td&gt;LG&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Headphones&lt;/td&gt;
&lt;td&gt;Accessories&lt;/td&gt;
&lt;td&gt;199.99&lt;/td&gt;
&lt;td&gt;Sony&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h4&gt;
  
  
  3. Sales
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE sales (
    sale_id INT PRIMARY KEY,
    customer_id INT,
    product_id INT,
    quantity_sold INT,
    sale_date DATE,
    total_amount DECIMAL(10, 2),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;sale_id&lt;/th&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;product_id&lt;/th&gt;
&lt;th&gt;quantity_sold&lt;/th&gt;
&lt;th&gt;sale_date&lt;/th&gt;
&lt;th&gt;total_amount&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2023-07-15&lt;/td&gt;
&lt;td&gt;999.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2023-08-20&lt;/td&gt;
&lt;td&gt;1599.98&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2023-09-10&lt;/td&gt;
&lt;td&gt;499.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2023-07-25&lt;/td&gt;
&lt;td&gt;599.97&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h4&gt;
  
  
  What We'll Build
&lt;/h4&gt;

&lt;p&gt;We'll use these tables to answer questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Which customers have never placed an order?"&lt;/li&gt;
&lt;li&gt;"What is the total revenue per product category?"&lt;/li&gt;
&lt;li&gt;"Who are the top 3 customers by spending, and how do they rank within their membership tier?"&lt;/li&gt;
&lt;li&gt;"How does each sale compare to the previous sale for the same customer?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the time you finish reading, you'll be able to write queries that combine these tables and apply window functions like a pro.&lt;/p&gt;
&lt;h3&gt;
  
  
  SQL Joins – Bringing Data Together
&lt;/h3&gt;

&lt;p&gt;In a well‑designed database, data is split across multiple tables to avoid redundancy. For example, we store customer details in one table, product information in another, and sales transactions in a third. To answer real‑world questions, we need to join these tables back together.&lt;/p&gt;
&lt;h3&gt;
  
  
  INNER JOIN
&lt;/h3&gt;

&lt;p&gt;An &lt;code&gt;INNER JOIN&lt;/code&gt; returns only the rows that have matching values in both tables. Rows without a match are excluded.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use case:&lt;/strong&gt; Find all customers who have placed orders, along with their order details.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example 1: List customers and their orders&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT 
    c.customer_id,
    c.first_name,
    c.last_name,
    s.sale_id,
    s.sale_date,
    s.total_amount
FROM customers c
INNER JOIN sales s ON c.customer_id = s.customer_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;first_name&lt;/th&gt;
&lt;th&gt;last_name&lt;/th&gt;
&lt;th&gt;sale_id&lt;/th&gt;
&lt;th&gt;sale_date&lt;/th&gt;
&lt;th&gt;total_amount&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Karen&lt;/td&gt;
&lt;td&gt;Molina&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2023-07-15&lt;/td&gt;
&lt;td&gt;999.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Elizabeth&lt;/td&gt;
&lt;td&gt;Archer&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2023-08-20&lt;/td&gt;
&lt;td&gt;1599.98&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Roberta&lt;/td&gt;
&lt;td&gt;Massey&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2023-09-10&lt;/td&gt;
&lt;td&gt;499.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Jacob&lt;/td&gt;
&lt;td&gt;Adams&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;2023-07-25&lt;/td&gt;
&lt;td&gt;599.97&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Cynthia&lt;/td&gt;
&lt;td&gt;Lowery&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2023-06-18&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;What happened?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;INNER JOIN&lt;/code&gt; combined rows from &lt;code&gt;customers&lt;/code&gt; and &lt;code&gt;sales&lt;/code&gt; wherever &lt;code&gt;customer_id&lt;/code&gt; matched. Customers who have never placed an order (like some of our 50 customers) do not appear in the result.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example 2: Add product details to the orders&lt;/strong&gt;&lt;br&gt;
We can join more than two tables. Let's include the product name and price.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT 
    c.first_name,
    c.last_name,
    s.sale_date,
    p.product_name,
    s.quantity_sold,
    s.total_amount
FROM customers c
INNER JOIN sales s ON c.customer_id = s.customer_id
INNER JOIN products p ON s.product_id = p.product_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;first_name&lt;/th&gt;
&lt;th&gt;last_name&lt;/th&gt;
&lt;th&gt;sale_date&lt;/th&gt;
&lt;th&gt;product_name&lt;/th&gt;
&lt;th&gt;quantity_sold&lt;/th&gt;
&lt;th&gt;total_amount&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Karen&lt;/td&gt;
&lt;td&gt;Molina&lt;/td&gt;
&lt;td&gt;2023-07-15&lt;/td&gt;
&lt;td&gt;Laptop&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;999.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Elizabeth&lt;/td&gt;
&lt;td&gt;Archer&lt;/td&gt;
&lt;td&gt;2023-08-20&lt;/td&gt;
&lt;td&gt;Smartphone&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1599.98&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Roberta&lt;/td&gt;
&lt;td&gt;Massey&lt;/td&gt;
&lt;td&gt;2023-09-10&lt;/td&gt;
&lt;td&gt;Washing Machine&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;499.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Jacob&lt;/td&gt;
&lt;td&gt;Adams&lt;/td&gt;
&lt;td&gt;2023-07-25&lt;/td&gt;
&lt;td&gt;Headphones&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;599.97&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cynthia&lt;/td&gt;
&lt;td&gt;Lowery&lt;/td&gt;
&lt;td&gt;2023-06-18&lt;/td&gt;
&lt;td&gt;Refrigerator&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key point:&lt;/strong&gt; Each &lt;code&gt;INNER JOIN&lt;/code&gt; adds another table, filtering out rows that don't match the join condition. Only orders with valid customer and product IDs appear.&lt;/p&gt;

&lt;h3&gt;
  
  
  LEFT JOIN (or LEFT OUTER JOIN)
&lt;/h3&gt;

&lt;p&gt;A &lt;code&gt;LEFT JOIN&lt;/code&gt; returns &lt;strong&gt;all rows from the left table,&lt;/strong&gt; and matching rows from the right table. If there's no match, the right‑table columns are filled with &lt;code&gt;NULL&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use case:&lt;/strong&gt; Show &lt;strong&gt;all customers,&lt;/strong&gt; even those who have never placed an order, and include any orders they may have.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT 
    c.customer_id,
    c.first_name,
    c.last_name,
    s.sale_id,
    s.sale_date,
    s.total_amount
FROM customers c
LEFT JOIN sales s ON c.customer_id = s.customer_id
ORDER BY c.customer_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result (first few rows):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;first_name&lt;/th&gt;
&lt;th&gt;last_name&lt;/th&gt;
&lt;th&gt;sale_id&lt;/th&gt;
&lt;th&gt;sale_date&lt;/th&gt;
&lt;th&gt;total_amount&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Karen&lt;/td&gt;
&lt;td&gt;Molina&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2023-07-15&lt;/td&gt;
&lt;td&gt;999.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Elizabeth&lt;/td&gt;
&lt;td&gt;Archer&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2023-08-20&lt;/td&gt;
&lt;td&gt;1599.98&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Roberta&lt;/td&gt;
&lt;td&gt;Massey&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2023-09-10&lt;/td&gt;
&lt;td&gt;499.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Jacob&lt;/td&gt;
&lt;td&gt;Adams&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;2023-07-25&lt;/td&gt;
&lt;td&gt;599.97&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Cynthia&lt;/td&gt;
&lt;td&gt;Lowery&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2023-06-18&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Emily&lt;/td&gt;
&lt;td&gt;King&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Linda&lt;/td&gt;
&lt;td&gt;Larsen&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Notice:&lt;/strong&gt; Customers 6 and 7 (Emily King, Linda Larsen) have no sales, so the &lt;code&gt;sale_id&lt;/code&gt;, &lt;code&gt;sale_date&lt;/code&gt;, and &lt;code&gt;total_amount&lt;/code&gt; are &lt;code&gt;NULL&lt;/code&gt;. This is a great way to find customers who haven't purchased anything.&lt;/p&gt;

&lt;h3&gt;
  
  
  RIGHT JOIN
&lt;/h3&gt;

&lt;p&gt;A &lt;code&gt;RIGHT JOIN&lt;/code&gt; is the mirror of &lt;code&gt;LEFT JOIN&lt;/code&gt;: it returns &lt;strong&gt;all rows from the right table,&lt;/strong&gt; and matching rows from the left. Unmatched left‑table columns become &lt;code&gt;NULL&lt;/code&gt;.&lt;br&gt;
Because you can always rewrite a &lt;code&gt;RIGHT JOIN&lt;/code&gt; as a &lt;code&gt;LEFT JOIN&lt;/code&gt; by swapping the table order, many developers rarely use it. But it's good to know it exists.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use case:&lt;/strong&gt; List all products and show which ones have been sold (if any).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT 
    p.product_id,
    p.product_name,
    s.sale_id,
    s.sale_date,
    s.total_amount
FROM sales s
RIGHT JOIN products p ON s.product_id = p.product_id
ORDER BY p.product_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result (first few rows):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;product_id&lt;/th&gt;
&lt;th&gt;product_name&lt;/th&gt;
&lt;th&gt;sale_id&lt;/th&gt;
&lt;th&gt;sale_date&lt;/th&gt;
&lt;th&gt;total_amount&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Laptop&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2023-07-15&lt;/td&gt;
&lt;td&gt;999.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Smartphone&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2023-08-20&lt;/td&gt;
&lt;td&gt;1599.98&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Washing Machine&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2023-09-10&lt;/td&gt;
&lt;td&gt;499.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Headphones&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;2023-07-25&lt;/td&gt;
&lt;td&gt;599.97&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Refrigerator&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2023-06-18&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Smart TV&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Microwave&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Products 6 (Smart TV) and 7 (Microwave) haven't been sold – their sale details are &lt;code&gt;NULL&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  FULL OUTER JOIN
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;FULL OUTER JOIN&lt;/code&gt; returns &lt;strong&gt;all rows from both tables,&lt;/strong&gt; matching where possible. Unmatched rows from either side are filled with &lt;code&gt;NULL&lt;/code&gt;. It's like a combination of &lt;code&gt;LEFT&lt;/code&gt; and &lt;code&gt;RIGHT&lt;/code&gt; joins.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use case:&lt;/strong&gt; See every customer and every product side by side – which is not very common in practice, but useful for finding orphans.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT 
    c.customer_id,
    c.first_name,
    c.last_name,
    p.product_id,
    p.product_name,
    s.sale_id
FROM customers c
FULL OUTER JOIN sales s ON c.customer_id = s.customer_id
FULL OUTER JOIN products p ON s.product_id = p.product_id
LIMIT 10;  -- just to keep the output manageable
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result (truncated):&lt;/strong&gt;&lt;br&gt;
You'll get a mix of customers without sales, products without sales, and actual transactions. Many &lt;code&gt;NULL&lt;/code&gt; cells appear.&lt;/p&gt;

&lt;p&gt;Because &lt;code&gt;FULL OUTER JOIN&lt;/code&gt; can produce huge result sets, it's used less often, but it's indispensable for certain data reconciliation tasks.&lt;/p&gt;
&lt;h3&gt;
  
  
  CROSS JOIN
&lt;/h3&gt;

&lt;p&gt;A &lt;code&gt;CROSS JOIN&lt;/code&gt; returns the Cartesian product of two tables – every row from the first table paired with every row from the second. There's no &lt;code&gt;ON&lt;/code&gt; condition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use case:&lt;/strong&gt; Generate all possible combinations, e.g., for creating product‑customer promotional lists.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT 
    c.customer_id,
    c.first_name,
    p.product_id,
    p.product_name
FROM customers c
CROSS JOIN products p
WHERE c.customer_id &amp;lt;= 3 AND p.product_id &amp;lt;= 3  -- limit for readability
ORDER BY c.customer_id, p.product_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;first_name&lt;/th&gt;
&lt;th&gt;product_id&lt;/th&gt;
&lt;th&gt;product_name&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Karen&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Laptop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Karen&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Smartphone&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Karen&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Washing Machine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Elizabeth&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Laptop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Elizabeth&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Smartphone&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Elizabeth&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Washing Machine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Roberta&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Laptop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Roberta&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Smartphone&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Roberta&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Washing Machine&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Every combination of the selected customers and products appears. This is handy for generating base data for analysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  SELF JOIN
&lt;/h3&gt;

&lt;p&gt;A &lt;code&gt;SELF JOIN&lt;/code&gt; is when a table is joined with itself. It's useful for hierarchical data or comparing rows within the same table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use case:&lt;/strong&gt; Find pairs of customers who have the same membership status.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT 
    c1.customer_id AS customer1,
    c1.first_name AS first1,
    c1.last_name AS last1,
    c2.customer_id AS customer2,
    c2.first_name AS first2,
    c2.last_name AS last2,
    c1.membership_status
FROM customers c1
JOIN customers c2 ON c1.membership_status = c2.membership_status
WHERE c1.customer_id &amp;lt; c2.customer_id
ORDER BY c1.membership_status, customer1;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result (few rows):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;customer1&lt;/th&gt;
&lt;th&gt;first1&lt;/th&gt;
&lt;th&gt;last1&lt;/th&gt;
&lt;th&gt;customer2&lt;/th&gt;
&lt;th&gt;first2&lt;/th&gt;
&lt;th&gt;last2&lt;/th&gt;
&lt;th&gt;membership_status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Karen&lt;/td&gt;
&lt;td&gt;Molina&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Roberta&lt;/td&gt;
&lt;td&gt;Massey&lt;/td&gt;
&lt;td&gt;Bronze&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Karen&lt;/td&gt;
&lt;td&gt;Molina&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Angela&lt;/td&gt;
&lt;td&gt;Hanson&lt;/td&gt;
&lt;td&gt;Bronze&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Karen&lt;/td&gt;
&lt;td&gt;Molina&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Whitney&lt;/td&gt;
&lt;td&gt;Wilson&lt;/td&gt;
&lt;td&gt;Bronze&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Elizabeth&lt;/td&gt;
&lt;td&gt;Archer&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Cynthia&lt;/td&gt;
&lt;td&gt;Lowery&lt;/td&gt;
&lt;td&gt;Silver&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We use &lt;code&gt;c1.customer_id &amp;lt; c2.customer_id&lt;/code&gt; to avoid duplicate pairs (e.g., (1,3) and (3,1)) and to exclude a customer paired with themselves.&lt;/p&gt;

&lt;h4&gt;
  
  
  Joining Multiple Tables – Putting It All Together
&lt;/h4&gt;

&lt;p&gt;Often you need to join more than two tables. Here's a query that combines customers, sales, and products to give a complete picture of each transaction.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT 
    c.first_name,
    c.last_name,
    s.sale_date,
    p.product_name,
    p.category,
    s.quantity_sold,
    s.total_amount
FROM customers c
JOIN sales s ON c.customer_id = s.customer_id
JOIN products p ON s.product_id = p.product_id
WHERE c.membership_status = 'Gold'   -- only Gold members
ORDER BY s.sale_date;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result (few rows):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;first_name&lt;/th&gt;
&lt;th&gt;last_name&lt;/th&gt;
&lt;th&gt;sale_date&lt;/th&gt;
&lt;th&gt;product_name&lt;/th&gt;
&lt;th&gt;category&lt;/th&gt;
&lt;th&gt;quantity_sold&lt;/th&gt;
&lt;th&gt;total_amount&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;James&lt;/td&gt;
&lt;td&gt;Rodriguez&lt;/td&gt;
&lt;td&gt;2023-07-15&lt;/td&gt;
&lt;td&gt;Gaming Console&lt;/td&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;350.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Steven&lt;/td&gt;
&lt;td&gt;Burnett&lt;/td&gt;
&lt;td&gt;2023-08-01&lt;/td&gt;
&lt;td&gt;Microwave&lt;/td&gt;
&lt;td&gt;Appliances&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;180.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This kind of multi‑table join is the backbone of reporting and analytics.&lt;/p&gt;

&lt;p&gt;Now that you've seen all join types, you're ready to combine data from any set of related tables. In the next section, we'll explore &lt;strong&gt;Window Functions&lt;/strong&gt; – a whole new level of analytical power.&lt;/p&gt;

&lt;h3&gt;
  
  
  Window Functions – Analyzing Data Without Losing Detail
&lt;/h3&gt;

&lt;p&gt;Window functions are a game‑changer. Unlike regular aggregate functions &lt;code&gt;(GROUP BY)&lt;/code&gt; that collapse multiple rows into one, window functions perform calculations &lt;strong&gt;across a set of rows related to the current row&lt;/strong&gt; while keeping each row intact. This makes them perfect for rankings, running totals, moving averages, and comparing values between rows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Basic Syntax
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;function_name (expression) OVER (
    [PARTITION BY column(s)]
    [ORDER BY column(s)]
    [frame_clause]
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;function_name&lt;/code&gt;: e.g., &lt;code&gt;ROW_NUMBER()&lt;/code&gt;, &lt;code&gt;RANK()&lt;/code&gt;, &lt;code&gt;SUM()&lt;/code&gt;, &lt;code&gt;LAG()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PARTITION BY&lt;/code&gt;: divides rows into groups (optional) – the function is applied separately to each group.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ORDER BY&lt;/code&gt;: defines the order within each partition.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;frame_clause&lt;/strong&gt;: specifies which rows to include (for aggregate window functions).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We'll use our &lt;code&gt;sales&lt;/code&gt; table, sometimes joined with &lt;code&gt;customers&lt;/code&gt; or &lt;code&gt;products&lt;/code&gt;, to illustrate.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. ROW_NUMBER() – Unique Sequential Number
&lt;/h3&gt;

&lt;p&gt;Assigns a unique integer to each row within a partition, starting at 1. Ties are broken arbitrarily.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Number all sales by date&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT 
    sale_id,
    customer_id,
    sale_date,
    total_amount,
    ROW_NUMBER() OVER (ORDER BY sale_date) AS row_num
FROM sales;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;sale_id&lt;/th&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;sale_date&lt;/th&gt;
&lt;th&gt;total_amount&lt;/th&gt;
&lt;th&gt;row_num&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2023-06-18&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2023-07-15&lt;/td&gt;
&lt;td&gt;999.99&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;2023-07-25&lt;/td&gt;
&lt;td&gt;599.97&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2023-08-20&lt;/td&gt;
&lt;td&gt;1599.98&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2023-09-10&lt;/td&gt;
&lt;td&gt;499.99&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Use case:&lt;/strong&gt; Pagination, assigning a unique identifier to each row in a result set.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. RANK() and DENSE_RANK() – Ranking with Ties
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;RANK()&lt;/code&gt; gives the same rank to equal values, then skips the next rank(s) (e.g., 1,1,3).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;DENSE_RANK()&lt;/code&gt; also gives the same rank to ties, but does not skip numbers (e.g., 1,1,2).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Rank customers by total spending (including those with no purchases).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WITH customer_spending AS (
    SELECT 
        c.customer_id,
        c.first_name,
        c.last_name,
        COALESCE(SUM(s.total_amount), 0) AS total_spent
    FROM customers c
    LEFT JOIN sales s ON c.customer_id = s.customer_id
    GROUP BY c.customer_id, c.first_name, c.last_name
)
SELECT 
    customer_id,
    first_name,
    last_name,
    total_spent,
    RANK() OVER (ORDER BY total_spent DESC) AS rank,
    DENSE_RANK() OVER (ORDER BY total_spent DESC) AS dense_rank
FROM customer_spending
ORDER BY total_spent DESC
LIMIT 10;  -- just show top 10 for brevity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result (first few rows):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;first_name&lt;/th&gt;
&lt;th&gt;last_name&lt;/th&gt;
&lt;th&gt;total_spent&lt;/th&gt;
&lt;th&gt;rank&lt;/th&gt;
&lt;th&gt;dense_rank&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Elizabeth&lt;/td&gt;
&lt;td&gt;Archer&lt;/td&gt;
&lt;td&gt;1599.98&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Emily&lt;/td&gt;
&lt;td&gt;King&lt;/td&gt;
&lt;td&gt;1500.00&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Karen&lt;/td&gt;
&lt;td&gt;Molina&lt;/td&gt;
&lt;td&gt;999.99&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Cynthia&lt;/td&gt;
&lt;td&gt;Lowery&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Jacob&lt;/td&gt;
&lt;td&gt;Adams&lt;/td&gt;
&lt;td&gt;599.97&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Roberta&lt;/td&gt;
&lt;td&gt;Massey&lt;/td&gt;
&lt;td&gt;499.99&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Linda&lt;/td&gt;
&lt;td&gt;Larsen&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Angela&lt;/td&gt;
&lt;td&gt;Hanson&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Whitney&lt;/td&gt;
&lt;td&gt;Wilson&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Notice how &lt;code&gt;RANK&lt;/code&gt; and &lt;code&gt;DENSE_RANK&lt;/code&gt; handle ties differently: when multiple customers have zero spending, &lt;code&gt;RANK&lt;/code&gt; skips from 7 to 10 (depending on how many zeros), while &lt;code&gt;DENSE_RANK&lt;/code&gt; assigns the same rank to all zeros and then increments by 1.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. SUM() as a Window Function – Running Totals
&lt;/h3&gt;

&lt;p&gt;Using aggregate functions like &lt;code&gt;SUM&lt;/code&gt; with &lt;code&gt;OVER&lt;/code&gt; allows you to compute a running total without collapsing rows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Calculate a running total of sales amounts ordered by date.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT 
    sale_id,
    sale_date,
    total_amount,
    SUM(total_amount) OVER (ORDER BY sale_date) AS running_total
FROM sales
ORDER BY sale_date;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;sale_id&lt;/th&gt;
&lt;th&gt;sale_date&lt;/th&gt;
&lt;th&gt;total_amount&lt;/th&gt;
&lt;th&gt;running_total&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2023-06-18&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2023-07-15&lt;/td&gt;
&lt;td&gt;999.99&lt;/td&gt;
&lt;td&gt;2199.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;2023-07-25&lt;/td&gt;
&lt;td&gt;599.97&lt;/td&gt;
&lt;td&gt;2799.96&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2023-08-20&lt;/td&gt;
&lt;td&gt;1599.98&lt;/td&gt;
&lt;td&gt;4399.94&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2023-09-10&lt;/td&gt;
&lt;td&gt;499.99&lt;/td&gt;
&lt;td&gt;4899.93&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each row's &lt;code&gt;running_total&lt;/code&gt; is the sum of all &lt;code&gt;total_amount&lt;/code&gt; from the earliest sale up to that row.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With partitioning:&lt;/strong&gt; You can reset the running total for each customer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT 
    customer_id,
    sale_id,
    sale_date,
    total_amount,
    SUM(total_amount) OVER (PARTITION BY customer_id ORDER BY sale_date) AS customer_running_total
FROM sales
ORDER BY customer_id, sale_date;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This would show, for each customer, the cumulative amount they've spent over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. LAG() and LEAD() – Accessing Other Rows
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;LAG()&lt;/code&gt; lets you access data from a previous row, while &lt;code&gt;LEAD()&lt;/code&gt; accesses a following row – great for comparing values.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; For each sale, show the previous sale amount (by date).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT 
    sale_id,
    sale_date,
    total_amount,
    LAG(total_amount) OVER (ORDER BY sale_date) AS prev_sale_amount,
    total_amount - LAG(total_amount) OVER (ORDER BY sale_date) AS difference_from_prev
FROM sales
ORDER BY sale_date;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;sale_id&lt;/th&gt;
&lt;th&gt;sale_date&lt;/th&gt;
&lt;th&gt;total_amount&lt;/th&gt;
&lt;th&gt;prev_sale_amount&lt;/th&gt;
&lt;th&gt;difference_from_prev&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2023-06-18&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2023-07-15&lt;/td&gt;
&lt;td&gt;999.99&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;td&gt;-200.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;2023-07-25&lt;/td&gt;
&lt;td&gt;599.97&lt;/td&gt;
&lt;td&gt;999.99&lt;/td&gt;
&lt;td&gt;-400.02&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2023-08-20&lt;/td&gt;
&lt;td&gt;1599.98&lt;/td&gt;
&lt;td&gt;599.97&lt;/td&gt;
&lt;td&gt;1000.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2023-09-10&lt;/td&gt;
&lt;td&gt;499.99&lt;/td&gt;
&lt;td&gt;1599.98&lt;/td&gt;
&lt;td&gt;-1099.99&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The first row has &lt;code&gt;NULL&lt;/code&gt; because there's no previous sale.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. NTILE() – Dividing into Buckets
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;NTILE(n)&lt;/code&gt; divides rows into n roughly equal groups (buckets) based on the ordering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Divide customers into 4 quartiles based on total spending.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WITH customer_spending AS (
    SELECT 
        c.customer_id,
        c.first_name,
        c.last_name,
        COALESCE(SUM(s.total_amount), 0) AS total_spent
    FROM customers c
    LEFT JOIN sales s ON c.customer_id = s.customer_id
    GROUP BY c.customer_id, c.first_name, c.last_name
)
SELECT 
    customer_id,
    first_name,
    last_name,
    total_spent,
    NTILE(4) OVER (ORDER BY total_spent DESC) AS spending_quartile
FROM customer_spending
ORDER BY total_spent DESC;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result (sample):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;first_name&lt;/th&gt;
&lt;th&gt;last_name&lt;/th&gt;
&lt;th&gt;total_spent&lt;/th&gt;
&lt;th&gt;spending_quartile&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Elizabeth&lt;/td&gt;
&lt;td&gt;Archer&lt;/td&gt;
&lt;td&gt;1599.98&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Emily&lt;/td&gt;
&lt;td&gt;King&lt;/td&gt;
&lt;td&gt;1500.00&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Karen&lt;/td&gt;
&lt;td&gt;Molina&lt;/td&gt;
&lt;td&gt;999.99&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Cynthia&lt;/td&gt;
&lt;td&gt;Lowery&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Jacob&lt;/td&gt;
&lt;td&gt;Adams&lt;/td&gt;
&lt;td&gt;599.97&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Roberta&lt;/td&gt;
&lt;td&gt;Massey&lt;/td&gt;
&lt;td&gt;499.99&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Linda&lt;/td&gt;
&lt;td&gt;Larsen&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Angela&lt;/td&gt;
&lt;td&gt;Hanson&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Whitney&lt;/td&gt;
&lt;td&gt;Wilson&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Angela&lt;/td&gt;
&lt;td&gt;Atkins&lt;/td&gt;
&lt;td&gt;75.00&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Quartile 1 contains the top 25% of spenders, quartile 4 the bottom 25%.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Window Functions with PARTITION BY – Per‑Group Rankings
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;PARTITION BY&lt;/code&gt; restricts the window to rows with the same value in a column.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Rank products by price within each category.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SELECT 
    product_id,
    product_name,
    category,
    price,
    RANK() OVER (PARTITION BY category ORDER BY price DESC) AS price_rank_in_category
FROM products
ORDER BY category, price_rank_in_category;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result (first few rows):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;product_id&lt;/th&gt;
&lt;th&gt;product_name&lt;/th&gt;
&lt;th&gt;category&lt;/th&gt;
&lt;th&gt;price&lt;/th&gt;
&lt;th&gt;price_rank_in_category&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Refrigerator&lt;/td&gt;
&lt;td&gt;Appliances&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Washing Machine&lt;/td&gt;
&lt;td&gt;Appliances&lt;/td&gt;
&lt;td&gt;499.99&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Microwave&lt;/td&gt;
&lt;td&gt;Appliances&lt;/td&gt;
&lt;td&gt;180.00&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Blender&lt;/td&gt;
&lt;td&gt;Appliances&lt;/td&gt;
&lt;td&gt;50.00&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Smart TV&lt;/td&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;1500.00&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Laptop&lt;/td&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;999.99&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Smartphone&lt;/td&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;799.99&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Gaming Console&lt;/td&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;350.00&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;Monitor&lt;/td&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;250.00&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Now you can see the most expensive product in each category at a glance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;p&gt;Window functions &lt;strong&gt;do not collapse rows&lt;/strong&gt; – they add calculated columns while preserving detail.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;PARTITION BY&lt;/code&gt; is like &lt;code&gt;GROUP BY&lt;/code&gt; within the window.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ORDER BY&lt;/code&gt; inside &lt;code&gt;OVER&lt;/code&gt; controls the order for functions like &lt;code&gt;ROW_NUMBER&lt;/code&gt;, &lt;code&gt;RANK&lt;/code&gt;, and running totals.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;RANK&lt;/code&gt; vs &lt;code&gt;DENSE_RANK&lt;/code&gt; depending on whether you want gaps after ties.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;LAG&lt;/code&gt; and &lt;code&gt;LEAD&lt;/code&gt; are invaluable for time‑series and comparative analysis.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;NTILE&lt;/code&gt; is great for segmenting data into percentiles or quartiles.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the next section, we'll explore &lt;strong&gt;Common Table Expressions (CTEs)&lt;/strong&gt; – a powerful way to organize complex queries, especially when combined with window functions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Table Expressions (CTEs) – Organizing Complex Queries
&lt;/h3&gt;

&lt;p&gt;A Common Table Expression (CTE) is a temporary result set that you can reference within a &lt;code&gt;SELECT&lt;/code&gt;, &lt;code&gt;INSERT&lt;/code&gt;, &lt;code&gt;UPDATE&lt;/code&gt;, or &lt;code&gt;DELETE&lt;/code&gt; statement. Think of it as a named subquery that makes your SQL more readable, reusable, and easier to debug.&lt;/p&gt;

&lt;h3&gt;
  
  
  Basic Syntax
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WITH cte_name AS (
    -- CTE query
    SELECT ...
)
-- Main query that uses the CTE
SELECT ...
FROM cte_name;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also define multiple CTEs in one &lt;code&gt;WITH&lt;/code&gt; clause by separating them with commas:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WITH 
cte1 AS (SELECT ...),
cte2 AS (SELECT ...)
SELECT ...
FROM cte1 JOIN cte2 ON ...;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;CTEs are especially valuable when you need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Break a complex query into simple, logical steps.&lt;/li&gt;
&lt;li&gt;Reference the same subquery multiple times in a query.&lt;/li&gt;
&lt;li&gt;Create recursive queries (though we'll focus on non‑recursive here).&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Example 1: Simple CTE – Total Spending per Customer
&lt;/h4&gt;

&lt;p&gt;Suppose we want a list of customers with their total spending, but we also need to use that result in further calculations. Instead of repeating the aggregation, we can define a CTE.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WITH customer_spending AS (
    SELECT 
        c.customer_id,
        c.first_name,
        c.last_name,
        COALESCE(SUM(s.total_amount), 0) AS total_spent
    FROM customers c
    LEFT JOIN sales s ON c.customer_id = s.customer_id
    GROUP BY c.customer_id, c.first_name, c.last_name
)
SELECT 
    customer_id,
    first_name,
    last_name,
    total_spent
FROM customer_spending
ORDER BY total_spent DESC
LIMIT 5;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result (top 5 spenders):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;first_name&lt;/th&gt;
&lt;th&gt;last_name&lt;/th&gt;
&lt;th&gt;total_spent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Elizabeth&lt;/td&gt;
&lt;td&gt;Archer&lt;/td&gt;
&lt;td&gt;1599.98&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Emily&lt;/td&gt;
&lt;td&gt;King&lt;/td&gt;
&lt;td&gt;1500.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Cynthia&lt;/td&gt;
&lt;td&gt;Lowery&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Karen&lt;/td&gt;
&lt;td&gt;Molina&lt;/td&gt;
&lt;td&gt;999.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Jacob&lt;/td&gt;
&lt;td&gt;Adams&lt;/td&gt;
&lt;td&gt;599.97&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Now &lt;code&gt;customer_spending&lt;/code&gt; can be reused in the same query or even in subsequent CTEs.&lt;/p&gt;

&lt;h4&gt;
  
  
  Example 2: Chaining Multiple CTEs
&lt;/h4&gt;

&lt;p&gt;You can build one CTE on top of another. This is great for step‑wise transformations.&lt;br&gt;
&lt;strong&gt;Task:&lt;/strong&gt; Find customers who spent more than the average spending of all customers.&lt;br&gt;
We'll first compute each customer's total, then the overall average, then filter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WITH customer_spending AS (
    SELECT 
        c.customer_id,
        c.first_name,
        c.last_name,
        COALESCE(SUM(s.total_amount), 0) AS total_spent
    FROM customers c
    LEFT JOIN sales s ON c.customer_id = s.customer_id
    GROUP BY c.customer_id, c.first_name, c.last_name
),
overall_avg AS (
    SELECT AVG(total_spent) AS avg_spent
    FROM customer_spending
)
SELECT 
    cs.customer_id,
    cs.first_name,
    cs.last_name,
    cs.total_spent,
    oa.avg_spent
FROM customer_spending cs
CROSS JOIN overall_avg oa
WHERE cs.total_spent &amp;gt; oa.avg_spent
ORDER BY cs.total_spent DESC;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result (sample):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;first_name&lt;/th&gt;
&lt;th&gt;last_name&lt;/th&gt;
&lt;th&gt;total_spent&lt;/th&gt;
&lt;th&gt;avg_spent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Elizabeth&lt;/td&gt;
&lt;td&gt;Archer&lt;/td&gt;
&lt;td&gt;1599.98&lt;/td&gt;
&lt;td&gt;287.65&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Emily&lt;/td&gt;
&lt;td&gt;King&lt;/td&gt;
&lt;td&gt;1500.00&lt;/td&gt;
&lt;td&gt;287.65&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Cynthia&lt;/td&gt;
&lt;td&gt;Lowery&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;td&gt;287.65&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Karen&lt;/td&gt;
&lt;td&gt;Molina&lt;/td&gt;
&lt;td&gt;999.99&lt;/td&gt;
&lt;td&gt;287.65&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Jacob&lt;/td&gt;
&lt;td&gt;Adams&lt;/td&gt;
&lt;td&gt;599.97&lt;/td&gt;
&lt;td&gt;287.65&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Roberta&lt;/td&gt;
&lt;td&gt;Massey&lt;/td&gt;
&lt;td&gt;499.99&lt;/td&gt;
&lt;td&gt;287.65&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;(Here the average is around 287.65 – you can see which customers are above it.)&lt;/p&gt;

&lt;h4&gt;
  
  
  Example 3: CTEs with Window Functions
&lt;/h4&gt;

&lt;p&gt;CTEs and window functions are a match made in heaven. The CTE prepares the data, and the window function adds analytics.&lt;br&gt;
&lt;strong&gt;Task:&lt;/strong&gt; Rank customers by total spending and show their quartile.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WITH customer_spending AS (
    SELECT 
        c.customer_id,
        c.first_name,
        c.last_name,
        COALESCE(SUM(s.total_amount), 0) AS total_spent
    FROM customers c
    LEFT JOIN sales s ON c.customer_id = s.customer_id
    GROUP BY c.customer_id, c.first_name, c.last_name
)
SELECT 
    customer_id,
    first_name,
    last_name,
    total_spent,
    RANK() OVER (ORDER BY total_spent DESC) AS spending_rank,
    NTILE(4) OVER (ORDER BY total_spent DESC) AS spending_quartile
FROM customer_spending
ORDER BY total_spent DESC;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result (first few rows):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;first_name&lt;/th&gt;
&lt;th&gt;last_name&lt;/th&gt;
&lt;th&gt;total_spent&lt;/th&gt;
&lt;th&gt;spending_rank&lt;/th&gt;
&lt;th&gt;spending_quartile&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Elizabeth&lt;/td&gt;
&lt;td&gt;Archer&lt;/td&gt;
&lt;td&gt;1599.98&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Emily&lt;/td&gt;
&lt;td&gt;King&lt;/td&gt;
&lt;td&gt;1500.00&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Cynthia&lt;/td&gt;
&lt;td&gt;Lowery&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Karen&lt;/td&gt;
&lt;td&gt;Molina&lt;/td&gt;
&lt;td&gt;999.99&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Jacob&lt;/td&gt;
&lt;td&gt;Adams&lt;/td&gt;
&lt;td&gt;599.97&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Roberta&lt;/td&gt;
&lt;td&gt;Massey&lt;/td&gt;
&lt;td&gt;499.99&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Linda&lt;/td&gt;
&lt;td&gt;Larsen&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Angela&lt;/td&gt;
&lt;td&gt;Hanson&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Whitney&lt;/td&gt;
&lt;td&gt;Wilson&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Angela&lt;/td&gt;
&lt;td&gt;Atkins&lt;/td&gt;
&lt;td&gt;75.00&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Notice how &lt;code&gt;RANK&lt;/code&gt; handles ties (customers with zero spending share rank 7) and &lt;code&gt;NTILE&lt;/code&gt; distributes rows into four buckets as evenly as possible.&lt;/p&gt;

&lt;h4&gt;
  
  
  Example 4: CTEs for Readability – Multi‑Step Analysis
&lt;/h4&gt;

&lt;p&gt;Imagine you need to produce a report that shows, for each product category, the top‑selling product and its sales. Without CTEs, the query could become a tangled mess of subqueries. With CTEs, it's clean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WITH category_sales AS (
    SELECT 
        p.category,
        p.product_name,
        SUM(s.total_amount) AS product_sales
    FROM products p
    LEFT JOIN sales s ON p.product_id = s.product_id
    GROUP BY p.category, p.product_name
),
ranked_products AS (
    SELECT 
        category,
        product_name,
        product_sales,
        RANK() OVER (PARTITION BY category ORDER BY product_sales DESC) AS rank
    FROM category_sales
)
SELECT 
    category,
    product_name,
    product_sales
FROM ranked_products
WHERE rank = 1;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;category&lt;/th&gt;
&lt;th&gt;product_name&lt;/th&gt;
&lt;th&gt;product_sales&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Appliances&lt;/td&gt;
&lt;td&gt;Refrigerator&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;Smart TV&lt;/td&gt;
&lt;td&gt;1500.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accessories&lt;/td&gt;
&lt;td&gt;Headphones&lt;/td&gt;
&lt;td&gt;599.97&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;(Assuming only a few sales; your actual data may vary.)&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CTEs improve readability&lt;/strong&gt; by letting you name intermediate results.&lt;/li&gt;
&lt;li&gt;They are &lt;strong&gt;not materialized&lt;/strong&gt; (in most databases) – they're just syntax, but they make complex queries easier to write and debug.&lt;/li&gt;
&lt;li&gt;You can use them with &lt;strong&gt;joins, window functions, and aggregations.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple CTEs&lt;/strong&gt; can be chained to build up a solution step by step.&lt;/li&gt;
&lt;li&gt;They are &lt;strong&gt;essential for recursive queries&lt;/strong&gt; (though we didn't cover that here).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now that you've seen how CTEs can organize even the most intricate logic, you're ready to tackle advanced analytical queries that combine all these concepts.&lt;/p&gt;

&lt;p&gt;In the final section, we'll bring everything together with some &lt;strong&gt;real‑world analytical problems&lt;/strong&gt; that use joins, window functions, and CTEs in harmony.&lt;/p&gt;

&lt;h3&gt;
  
  
  Putting It All Together – Real‑World Analytical Problems
&lt;/h3&gt;

&lt;p&gt;Now that we've covered joins, window functions, and CTEs individually, let's combine them to solve realistic business questions. These examples show how these tools work in harmony to extract valuable insights from our e‑commerce data.&lt;/p&gt;

&lt;h4&gt;
  
  
  Example 1: Top 3 Customers by Spending in Each Membership Tier
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Business question:&lt;/strong&gt; For each membership tier (Bronze, Silver, Gold), who are the top 3 customers by total spending?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;CTE&lt;/strong&gt; to compute total spending per customer, including their membership status.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Window function&lt;/strong&gt; &lt;code&gt;RANK()&lt;/code&gt; partitioned by membership status to rank customers within their tier.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filter&lt;/strong&gt; to keep only ranks 1–3.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WITH customer_spending AS (
    SELECT 
        c.customer_id,
        c.first_name,
        c.last_name,
        c.membership_status,
        COALESCE(SUM(s.total_amount), 0) AS total_spent
    FROM customers c
    LEFT JOIN sales s ON c.customer_id = s.customer_id
    GROUP BY c.customer_id, c.first_name, c.last_name, c.membership_status
),
ranked_customers AS (
    SELECT 
        customer_id,
        first_name,
        last_name,
        membership_status,
        total_spent,
        RANK() OVER (PARTITION BY membership_status ORDER BY total_spent DESC) AS rank_in_tier
    FROM customer_spending
)
SELECT 
    membership_status,
    rank_in_tier,
    customer_id,
    first_name,
    last_name,
    total_spent
FROM ranked_customers
WHERE rank_in_tier &amp;lt;= 3
ORDER BY membership_status, rank_in_tier;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result (sample from our data):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;membership_status&lt;/th&gt;
&lt;th&gt;rank_in_tier&lt;/th&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;first_name&lt;/th&gt;
&lt;th&gt;last_name&lt;/th&gt;
&lt;th&gt;total_spent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Bronze&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Karen&lt;/td&gt;
&lt;td&gt;Molina&lt;/td&gt;
&lt;td&gt;999.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bronze&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Roberta&lt;/td&gt;
&lt;td&gt;Massey&lt;/td&gt;
&lt;td&gt;499.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bronze&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Angela&lt;/td&gt;
&lt;td&gt;Hanson&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gold&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Jacob&lt;/td&gt;
&lt;td&gt;Adams&lt;/td&gt;
&lt;td&gt;599.97&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gold&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;James&lt;/td&gt;
&lt;td&gt;Rodriguez&lt;/td&gt;
&lt;td&gt;350.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gold&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Steven&lt;/td&gt;
&lt;td&gt;Burnett&lt;/td&gt;
&lt;td&gt;180.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Silver&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Elizabeth&lt;/td&gt;
&lt;td&gt;Archer&lt;/td&gt;
&lt;td&gt;1599.98&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Silver&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Cynthia&lt;/td&gt;
&lt;td&gt;Lowery&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Silver&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Emily&lt;/td&gt;
&lt;td&gt;King&lt;/td&gt;
&lt;td&gt;1500.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This query combines a &lt;strong&gt;LEFT JOIN&lt;/strong&gt; (to include customers with no purchases), a &lt;strong&gt;CTE&lt;/strong&gt; for the base aggregation, a window function for ranking, and finally a &lt;strong&gt;filter&lt;/strong&gt; on the rank. The result gives us a quick view of the best customers in each tier.&lt;/p&gt;

&lt;h4&gt;
  
  
  Example 2: Product Performance vs. Category Average
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Business question:&lt;/strong&gt; For each product, show its total sales and how much it differs from the average sales of products in the same category. Which products are outperforming their category?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;First CTE&lt;/strong&gt; computes total sales per product with category.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Second CTE&lt;/strong&gt; computes average sales per category.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Main query&lt;/strong&gt; joins the two and calculates the difference.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WITH product_sales AS (
    SELECT 
        p.category,
        p.product_id,
        p.product_name,
        COALESCE(SUM(s.total_amount), 0) AS product_total
    FROM products p
    LEFT JOIN sales s ON p.product_id = s.product_id
    GROUP BY p.category, p.product_id, p.product_name
),
category_avg AS (
    SELECT 
        category,
        AVG(product_total) AS avg_in_category
    FROM product_sales
    GROUP BY category
)
SELECT 
    ps.category,
    ps.product_name,
    ps.product_total,
    ca.avg_in_category,
    ps.product_total - ca.avg_in_category AS difference_from_avg
FROM product_sales ps
JOIN category_avg ca ON ps.category = ca.category
ORDER BY ps.category, difference_from_avg DESC;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result (sample):&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;category&lt;/th&gt;
&lt;th&gt;product_name&lt;/th&gt;
&lt;th&gt;product_total&lt;/th&gt;
&lt;th&gt;avg_in_category&lt;/th&gt;
&lt;th&gt;difference_from_avg&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Accessories&lt;/td&gt;
&lt;td&gt;Headphones&lt;/td&gt;
&lt;td&gt;599.97&lt;/td&gt;
&lt;td&gt;99.99&lt;/td&gt;
&lt;td&gt;499.98&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accessories&lt;/td&gt;
&lt;td&gt;Wireless Mouse&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;99.99&lt;/td&gt;
&lt;td&gt;-99.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accessories&lt;/td&gt;
&lt;td&gt;Keyboard&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;99.99&lt;/td&gt;
&lt;td&gt;-99.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Appliances&lt;/td&gt;
&lt;td&gt;Refrigerator&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;td&gt;244.99&lt;/td&gt;
&lt;td&gt;955.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Appliances&lt;/td&gt;
&lt;td&gt;Washing Machine&lt;/td&gt;
&lt;td&gt;499.99&lt;/td&gt;
&lt;td&gt;244.99&lt;/td&gt;
&lt;td&gt;255.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Appliances&lt;/td&gt;
&lt;td&gt;Microwave&lt;/td&gt;
&lt;td&gt;180.00&lt;/td&gt;
&lt;td&gt;244.99&lt;/td&gt;
&lt;td&gt;-64.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Appliances&lt;/td&gt;
&lt;td&gt;Blender&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;244.99&lt;/td&gt;
&lt;td&gt;-244.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;Smart TV&lt;/td&gt;
&lt;td&gt;1500.00&lt;/td&gt;
&lt;td&gt;424.99&lt;/td&gt;
&lt;td&gt;1075.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;Laptop&lt;/td&gt;
&lt;td&gt;999.99&lt;/td&gt;
&lt;td&gt;424.99&lt;/td&gt;
&lt;td&gt;575.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;Smartphone&lt;/td&gt;
&lt;td&gt;1599.98&lt;/td&gt;
&lt;td&gt;424.99&lt;/td&gt;
&lt;td&gt;1174.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;Gaming Console&lt;/td&gt;
&lt;td&gt;350.00&lt;/td&gt;
&lt;td&gt;424.99&lt;/td&gt;
&lt;td&gt;-74.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;Monitor&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;424.99&lt;/td&gt;
&lt;td&gt;-424.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;External Hard Drive&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;424.99&lt;/td&gt;
&lt;td&gt;-424.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;Tablet&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;424.99&lt;/td&gt;
&lt;td&gt;-424.99&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Electronics&lt;/td&gt;
&lt;td&gt;Smartwatch&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;td&gt;424.99&lt;/td&gt;
&lt;td&gt;-424.99&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Here, joins link products to sales, &lt;strong&gt;CTEs&lt;/strong&gt; structure the intermediate calculations, and the final ORDER BY highlights top performers. We can immediately see which products are stars and which are lagging in their category.&lt;/p&gt;

&lt;h4&gt;
  
  
  Example 3: Customers Who Made Consecutive Purchases Within 7 Days
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Business question:&lt;/strong&gt; Identify customers who have made purchases within 7 days of a previous purchase – a sign of engaged shoppers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use &lt;code&gt;LAG()&lt;/code&gt; window function to get the previous purchase date for each customer.&lt;/li&gt;
&lt;li&gt;Compute the date difference.&lt;/li&gt;
&lt;li&gt;Filter where the gap is ≤ 7 days.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WITH customer_purchases AS (
    SELECT 
        customer_id,
        sale_date,
        LAG(sale_date) OVER (PARTITION BY customer_id ORDER BY sale_date) AS prev_purchase_date
    FROM sales
)
SELECT DISTINCT
    c.customer_id,
    c.first_name,
    c.last_name,
    cp.sale_date AS purchase_date,
    cp.prev_purchase_date,
    (cp.sale_date - cp.prev_purchase_date) AS days_between
FROM customer_purchases cp
JOIN customers c ON cp.customer_id = c.customer_id
WHERE cp.prev_purchase_date IS NOT NULL
  AND (cp.sale_date - cp.prev_purchase_date) &amp;lt;= 7
ORDER BY c.customer_id, cp.sale_date;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result (In respect to our data)&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;customer_id&lt;/th&gt;
&lt;th&gt;first_name&lt;/th&gt;
&lt;th&gt;last_name&lt;/th&gt;
&lt;th&gt;purchase_date&lt;/th&gt;
&lt;th&gt;prev_purchase_date&lt;/th&gt;
&lt;th&gt;days_between&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Karen&lt;/td&gt;
&lt;td&gt;Molina&lt;/td&gt;
&lt;td&gt;2023-07-15&lt;/td&gt;
&lt;td&gt;2023-07-10&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Cynthia&lt;/td&gt;
&lt;td&gt;Lowery&lt;/td&gt;
&lt;td&gt;2023-06-18&lt;/td&gt;
&lt;td&gt;2023-06-12&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This query uses a window function &lt;code&gt;(LAG)&lt;/code&gt; within a &lt;strong&gt;CTE&lt;/strong&gt;, then &lt;strong&gt;joins&lt;/strong&gt; with the customers table to get names, and finally filters. It demonstrates how window functions can identify sequential patterns.&lt;/p&gt;

&lt;h4&gt;
  
  
  Example 4: Monthly Sales Trends – Running Total and Month‑over‑Month Change
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Business question:&lt;/strong&gt; Show monthly sales totals, a running total, and the percentage change from the previous month.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Approach:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;CTE&lt;/strong&gt; to aggregate sales by month.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Window functions&lt;/strong&gt; for running total (&lt;code&gt;SUM() OVER&lt;/code&gt;) and previous month's sales (&lt;code&gt;LAG&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Compute the change.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WITH monthly_sales AS (
    SELECT 
        DATE_TRUNC('month', sale_date)::DATE AS month,
        SUM(total_amount) AS monthly_total
    FROM sales
    GROUP BY DATE_TRUNC('month', sale_date)
)
SELECT 
    month,
    monthly_total,
    SUM(monthly_total) OVER (ORDER BY month) AS running_total,
    LAG(monthly_total) OVER (ORDER BY month) AS prev_month_total,
    CASE 
        WHEN LAG(monthly_total) OVER (ORDER BY month) IS NOT NULL 
        THEN ROUND(100.0 * (monthly_total - LAG(monthly_total) OVER (ORDER BY month)) / LAG(monthly_total) OVER (ORDER BY month), 2)
        ELSE NULL 
    END AS pct_change
FROM monthly_sales
ORDER BY month;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result;&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;month&lt;/th&gt;
&lt;th&gt;monthly_total&lt;/th&gt;
&lt;th&gt;running_total&lt;/th&gt;
&lt;th&gt;prev_month_total&lt;/th&gt;
&lt;th&gt;pct_change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2023-06-01&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2023-07-01&lt;/td&gt;
&lt;td&gt;1599.96&lt;/td&gt;
&lt;td&gt;2799.96&lt;/td&gt;
&lt;td&gt;1200.00&lt;/td&gt;
&lt;td&gt;33.33&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2023-08-01&lt;/td&gt;
&lt;td&gt;1599.98&lt;/td&gt;
&lt;td&gt;4399.94&lt;/td&gt;
&lt;td&gt;1599.96&lt;/td&gt;
&lt;td&gt;0.00&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2023-09-01&lt;/td&gt;
&lt;td&gt;499.99&lt;/td&gt;
&lt;td&gt;4899.93&lt;/td&gt;
&lt;td&gt;1599.98&lt;/td&gt;
&lt;td&gt;-68.75&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Here, the &lt;strong&gt;CTE&lt;/strong&gt; aggregates data, and &lt;strong&gt;window functions&lt;/strong&gt; add running total and previous‑month comparison. This gives a concise view of sales momentum.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Throughout this article, we've explored three powerful SQL concepts—&lt;strong&gt;joins&lt;/strong&gt;, &lt;strong&gt;window functions&lt;/strong&gt;, and &lt;strong&gt;CTEs&lt;/strong&gt;—using a practical e‑commerce dataset. As a data professional, knowing when and how to apply these tools is essential for building efficient, readable, and maintainable queries.&lt;/p&gt;

&lt;p&gt;Here's a quick reference guide based on what we've learned:&lt;/p&gt;

&lt;h4&gt;
  
  
  Use Window Functions When:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;You need to perform calculations across rows &lt;strong&gt;without collapsing&lt;/strong&gt; the result set (e.g., running totals, moving averages).&lt;/li&gt;
&lt;li&gt;You're working with &lt;strong&gt;rankings&lt;/strong&gt; or &lt;strong&gt;percentiles&lt;/strong&gt; (e.g., top customers by spending, quartiles).&lt;/li&gt;
&lt;li&gt;You need to &lt;strong&gt;compare values between rows&lt;/strong&gt; (e.g., previous sale amount, month‑over‑month change).&lt;/li&gt;
&lt;li&gt;Your analysis requires &lt;strong&gt;retaining detail&lt;/strong&gt; while still gaining aggregate insights.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Use CTEs When:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;You're breaking down a &lt;strong&gt;complex query&lt;/strong&gt; into simpler, logical steps.&lt;/li&gt;
&lt;li&gt;The same intermediate result needs to be &lt;strong&gt;referenced multiple times&lt;/strong&gt; in a query.&lt;/li&gt;
&lt;li&gt;You want to improve &lt;strong&gt;readability&lt;/strong&gt; and make your code easier to debug.&lt;/li&gt;
&lt;li&gt;You're preparing data for &lt;strong&gt;window functions&lt;/strong&gt; or further transformations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Use Joins When:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;You need to &lt;strong&gt;combine data from multiple related tables&lt;/strong&gt; (e.g., customers with their orders).&lt;/li&gt;
&lt;li&gt;Your database is &lt;strong&gt;normalized&lt;/strong&gt; and information is spread across tables.&lt;/li&gt;
&lt;li&gt;You're building &lt;strong&gt;reporting datasets&lt;/strong&gt; that require fields from different sources.&lt;/li&gt;
&lt;li&gt;You need to understand &lt;strong&gt;relationships&lt;/strong&gt; between entities in your data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By mastering these advanced SQL concepts, you're not just writing queries—you're building a toolkit for solving real business problems. Whether it's identifying top‑performing products, understanding customer behavior, or tracking sales trends over time, the techniques you've learned here will serve you across industries and datasets.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;u&gt;&lt;strong&gt;&lt;em&gt;Data Analysis Step by Step;&lt;/em&gt;&lt;/strong&gt;&lt;/u&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;1st Read: Git &amp;amp; Github Beginner's guide&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you’re also learning version control with Git, you can read my Git &amp;amp; GitHub beginner’s guide here: &lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/git-for-data-scientists-data-engineers-my-very-first-beginner-guide-git-bash-github-3952"&gt;https://dev.to/charles_ndungu/git-for-data-scientists-data-engineers-my-very-first-beginner-guide-git-bash-github-3952&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;2nd Read: Mastering Excel&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After mastering Git basics, you can learn how to analyze data using Microsoft Excel here:&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/ms-excel-for-data-analytics-a-friendly-practical-guide-for-beginners-hjn"&gt;https://dev.to/charles_ndungu/ms-excel-for-data-analytics-a-friendly-practical-guide-for-beginners-hjn&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;3rd Read: Data Modelling &amp;amp; Schemas&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This article dives into data modelling in Power BI, covering star and snowflake schemas, fact and dimension tables, relationships, and why good modelling is essential for accurate insights and fast reports.&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/the-backbone-of-power-bi-a-deep-dive-into-data-modeling-schemas-1o1l"&gt;https://dev.to/charles_ndungu/the-backbone-of-power-bi-a-deep-dive-into-data-modeling-schemas-1o1l&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;4th Read: Data Analysis Steps in Power BI&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This article reveals how Power BI analysts act as data translators, bridging the gap between messy data and clear business action. We break down their essential three-step process: cleaning raw information, encoding logic with DAX, and designing dashboards that drive real decisions.&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/from-raw-data-to-real-action-the-analysts-journey-as-a-data-translator-in-power-bi-2gl6"&gt;https://dev.to/charles_ndungu/from-raw-data-to-real-action-the-analysts-journey-as-a-data-translator-in-power-bi-2gl6&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;5th Read: From Tables to Insights – A SQL Masterclass&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data lives scattered across tables—customers here, products there, sales everywhere. This article is your guide to bringing it all together. Follow along with a real online store dataset as we build queries that rank top customers, spot sales trends, and uncover hidden patterns. By the end, you won't just write SQL—you'll tell stories with data.&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/mastering-sql-joins-and-window-functions-a-practical-guide-with-an-e-commerce-dataset-ejg"&gt;https://dev.to/charles_ndungu/mastering-sql-joins-and-window-functions-a-practical-guide-with-an-e-commerce-dataset-ejg&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Charles-Ndungu/excel-for-data-analytics" rel="noopener noreferrer"&gt;https://github.com/Charles-Ndungu/excel-for-data-analytics&lt;/a&gt;&lt;/p&gt;

</description>
      <category>sql</category>
      <category>database</category>
      <category>softwaredevelopment</category>
      <category>datascience</category>
    </item>
    <item>
      <title>From Raw Data to Real Action: The Analyst's Journey as a Data Translator in Power BI</title>
      <dc:creator>Charles</dc:creator>
      <pubDate>Sun, 08 Feb 2026 18:54:55 +0000</pubDate>
      <link>https://dev.to/charles_ndungu/from-raw-data-to-real-action-the-analysts-journey-as-a-data-translator-in-power-bi-2gl6</link>
      <guid>https://dev.to/charles_ndungu/from-raw-data-to-real-action-the-analysts-journey-as-a-data-translator-in-power-bi-2gl6</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fidu4alxb8jqtjdqagcs7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fidu4alxb8jqtjdqagcs7.png" alt="Power BI" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In today's organizations, a critical gap persists. On one side, executives and managers demand clear, immediate answers to urgent questions: &lt;em&gt;"Are we on track to hit our quarterly targets?"&lt;/em&gt; &lt;em&gt;"Which product line is underperforming and why?"&lt;/em&gt; On the other side lies the reality of modern data: a sprawling, chaotic landscape of spreadsheets, databases, and legacy systems—each with its own inconsistencies, errors, and obscure logic.&lt;/p&gt;

&lt;p&gt;Bridging this gap is the fundamental role of the data analyst. But to call them mere "number crunchers" is a profound understatement. A more apt description is that of a translator. An analyst's core skill is not just proficiency with tools, but the ability to interpret the raw, technical "language" of disparate systems and translate it into the clear, actionable "language" of business decisions.&lt;/p&gt;

&lt;p&gt;This translation is a disciplined, three-act process. It begins with taming chaos into a trusted foundation, moves to encoding complex business logic into dynamic calculations, and culminates in designing a compelling narrative that drives action.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deciphering the Chaos – Translating Raw Data into a Trusted Foundation
&lt;/h3&gt;

&lt;p&gt;The analyst's first task is to confront the "source text": the raw data. This is rarely clean. It's more likely a collection of CSV files with different date formats, a Salesforce report with merged header cells, and a SQL table where the "Region" column suddenly changed from "EMEA" to "Europe &amp;amp; Middle East."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2wp3kio30lsg3jwtwbs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp2wp3kio30lsg3jwtwbs.png" alt="Data Cleaning" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Translator's First Tool: Power Query&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is where Power Query, Power BI's data transformation engine, moves from being a feature to being a philosophy. Its purpose is not to apply a one-time fix, but to build a single, reproducible source of truth. Every step you record—removing a column, splitting a field, merging a table—is saved as a recipe. The next time data refreshes, the recipe runs automatically, ensuring consistency and freeing you from manual, error-prone cleaning.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxcuqjij3oiab7hca0804.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxcuqjij3oiab7hca0804.png" alt="Power Query Interface" width="800" height="309"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Here’s how a translator thinks within Power Query:
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Filter at the Source, Not the End:&lt;/strong&gt; A common rookie mistake is to load 10 years of historical data only to analyze the last quarter. A skilled translator uses Power Query's "Filter Rows" step early in the process to load only the necessary data. This dramatically improves performance and model refresh times.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvusq1jzm2gungpdfydgz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvusq1jzm2gungpdfydgz.png" alt="Filtering Pivot" width="800" height="430"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pivot and Unpivot Thoughtfully:&lt;/strong&gt; Data often arrives in a "wide" format convenient for human reading but terrible for analysis. A sales report might have columns for Jan_Sales, Feb_Sales, Mar_Sales. A translator "unpivots" these into two columns: Month and Sales. This long format is what Power BI's relationships and calculations need to work efficiently.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2wromg43saywuh4sbd4b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2wromg43saywuh4sbd4b.png" alt="Pivot &amp;amp; unpivoting" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Leverage Custom Columns for Logic:&lt;/strong&gt; Need to categorize customers based on purchase frequency or flag orders that exceed a certain threshold? Instead of doing this later in DAX (which can hurt performance), create a Conditional Column in Power Query during the data prep phase. This logic becomes part of your stable data foundation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl0xuopwl2c1ppj60tv5b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl0xuopwl2c1ppj60tv5b.png" alt="Custom Column" width="800" height="498"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The output is no longer just "data." It is a structured, reliable, and analysis-ready dataset. The chaos has been translated into order, setting the stage for the next phase: &lt;strong&gt;adding intelligence.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Embedding Intelligence – Translating Business Questions into DAX
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgl0rh80094xwj0c87wlv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgl0rh80094xwj0c87wlv.png" alt="Data Decision" width="800" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With clean tables related in a star schema, the analyst now faces the core translation challenge: turning stakeholder questions into calculated answers. This is the realm of &lt;strong&gt;Data Analysis Expressions (DAX)&lt;/strong&gt;, the formula language of Power BI.&lt;/p&gt;

&lt;p&gt;DAX is more than a collection of functions; it is the syntax for expressing business rules. A question like &lt;em&gt;"What were our sales this month compared to the same month last year, but only for our premium product segment?"&lt;/em&gt; requires a precise translation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu73i40hurkfotnbqp1u7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu73i40hurkfotnbqp1u7.png" alt="Dax" width="800" height="1371"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Moving Beyond Basic Aggregation: The Art of Context&lt;/strong&gt;&lt;br&gt;
The power of DAX lies in its understanding of &lt;strong&gt;context&lt;/strong&gt;. A simple measure Total Sales = SUM(Sales[Amount]) behaves differently depending on where it's used. Put it in a card visual, it shows the grand total. Put it in a table sliced by Region, it automatically shows the total per region. This is filter context in action.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6d19z8aagcg9p7ayvrnc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6d19z8aagcg9p7ayvrnc.png" alt="Sum Function" width="800" height="399"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The translator uses advanced functions to manipulate this context and answer complex questions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Time Intelligence for Trend Translation:&lt;/strong&gt; Questions about growth and trends are fundamental. DAX provides dedicated time intelligence functions to translate them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sales PY = CALCULATE([Total Sales], SAMEPERIODLASTYEAR('Date'[Date]))
Sales Growth % = DIVIDE([Total Sales] - [Sales PY], [Sales PY])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code seamlessly calculates prior year sales and the percentage growth, regardless of whether the user is looking at a day, month, or quarter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcvpnh5zn9832u6tv5wrl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcvpnh5zn9832u6tv5wrl.png" alt="Sum Function" width="800" height="428"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The&lt;/strong&gt; CALCULATE &lt;strong&gt;Function&lt;/strong&gt;: &lt;strong&gt;The Master Translator&lt;/strong&gt;: CALCULATE is the most important function in DAX. It modifies the context of a calculation. It's how you answer "what if" questions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sales for Premium Products =CALCULATE([Total Sales],'Product'[Segment] = "Premium")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This measure translates the question "What are sales, but only for premium products?" into a dynamic calculation that respects all other filters on the report.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5dxyb497zuybcpu6b0fh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5dxyb497zuybcpu6b0fh.png" alt="Formulas" width="800" height="455"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Writing for Readability&lt;/strong&gt;: &lt;strong&gt;The&lt;/strong&gt; VAR &lt;strong&gt;Keyword&lt;/strong&gt;: Good translators make complex logic understandable. In DAX, the VAR (variable) statement is essential for this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Profit Margin % =
VAR TotalProfit = SUM(Sales[Profit])
VAR TotalRevenue = SUM(Sales[Revenue])
RETURN DIVIDE(TotalProfit, TotalRevenue, 0)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This breaks the calculation into logical steps, making it easier to debug, modify, and explain to others.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fafwjzhpgqj1b680l78v3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fafwjzhpgqj1b680l78v3.png" alt="Var Table" width="800" height="367"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The output is a suite of dynamic measures. The dataset is now imbued with business logic, capable of answering nuanced questions interactively. The data is intelligent, but it is not yet a story.&lt;/p&gt;

&lt;h3&gt;
  
  
  Narrating for Action – Translating Insights into Compelling Dashboards
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4s8tulx3pg2bzaz6myw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4s8tulx3pg2bzaz6myw.png" alt="Dashboard" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The final and most critical translation is from insight to action. A dashboard is not a data dump; it is a &lt;strong&gt;visual argument&lt;/strong&gt; and a &lt;strong&gt;guidance system&lt;/strong&gt;. Its success is measured not by how many charts it contains, but by how quickly it leads a user to a confident decision.&lt;/p&gt;

&lt;h3&gt;
  
  
  Design Principles for the Decision-Maker
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;A translator designs with empathy for the audience:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Five-Second Rule&lt;/strong&gt;: The primary objective of the entire page should be understood within five seconds. This is achieved through a clear visual hierarchy: a prominent KPI header at the top, supporting trend charts in the middle, and detailed breakdowns at the bottom.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Guided Interactivity, Not Just Features&lt;/strong&gt;: Slicers, cross-filtering, and drill-throughs are powerful, but they must serve the narrative. A well-designed dashboard uses bookmarks to create "guided analytical stories"—clicking a button might reset filters, highlight a key trend, and bring a specific detail page to the forefront, leading the user down a pre-defined analytical path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Leverage the Full Ecosystem&lt;/strong&gt;: Power BI is more than a canvas. The translator uses Data Alerts to proactively notify stakeholders when a KPI crosses a threshold, turning a passive report into an active monitoring tool. They enable the Q&amp;amp;A feature, allowing users to ask questions in natural language ("show me sales by region last quarter"), fostering a conversational relationship with the data.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Virtuous Cycle of Informed Action
&lt;/h3&gt;

&lt;p&gt;The journey of the data translator in Power BI is a continuous, virtuous cycle: &lt;strong&gt;Chaos&lt;/strong&gt; → &lt;strong&gt;Structure&lt;/strong&gt; → &lt;strong&gt;Logic&lt;/strong&gt; → &lt;strong&gt;Narrative&lt;/strong&gt; → &lt;strong&gt;Action&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Each decision made from a well-crafted dashboard generates new data and new questions, which flow back to the analyst. This starts the translation process anew, creating a resilient loop of increasingly informed decision-making.&lt;/p&gt;

&lt;p&gt;The true power of an analyst, therefore, lies not in memorizing every DAX function or mastering every visualization, but in architecting and sustaining this cycle. It is the deep understanding that their role is to be the essential, human link between the raw potential of data and the tangible progress of the business. By embracing the discipline of translation, they move from being reporters of the past to becoming indispensable guides to the future.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;u&gt;&lt;strong&gt;&lt;em&gt;Data Analysis Step by Step;&lt;/em&gt;&lt;/strong&gt;&lt;/u&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;1st Read: Git &amp;amp; Github Beginner's guide&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you’re also learning version control with Git, you can read my Git &amp;amp; GitHub beginner’s guide here: &lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/git-for-data-scientists-data-engineers-my-very-first-beginner-guide-git-bash-github-3952"&gt;https://dev.to/charles_ndungu/git-for-data-scientists-data-engineers-my-very-first-beginner-guide-git-bash-github-3952&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;2nd Read: Mastering Excel&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After mastering Git basics, you can learn how to analyze data using Microsoft Excel here:&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/ms-excel-for-data-analytics-a-friendly-practical-guide-for-beginners-hjn"&gt;https://dev.to/charles_ndungu/ms-excel-for-data-analytics-a-friendly-practical-guide-for-beginners-hjn&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;3rd Read: Data Modelling &amp;amp; Schemas&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This article dives into data modelling in Power BI, covering star and snowflake schemas, fact and dimension tables, relationships, and why good modelling is essential for accurate insights and fast reports.&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/the-backbone-of-power-bi-a-deep-dive-into-data-modeling-schemas-1o1l"&gt;https://dev.to/charles_ndungu/the-backbone-of-power-bi-a-deep-dive-into-data-modeling-schemas-1o1l&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;4th Read: Data Analysis Steps in Power BI&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This article reveals how Power BI analysts act as data translators, bridging the gap between messy data and clear business action. We break down their essential three-step process: cleaning raw information, encoding logic with DAX, and designing dashboards that drive real decisions.&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/from-raw-data-to-real-action-the-analysts-journey-as-a-data-translator-in-power-bi-2gl6"&gt;https://dev.to/charles_ndungu/from-raw-data-to-real-action-the-analysts-journey-as-a-data-translator-in-power-bi-2gl6&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;5th Read: From Tables to Insights – A SQL Masterclass&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data lives scattered across tables—customers here, products there, sales everywhere. This article is your guide to bringing it all together. Follow along with a real online store dataset as we build queries that rank top customers, spot sales trends, and uncover hidden patterns. By the end, you won't just write SQL—you'll tell stories with data.&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/mastering-sql-joins-and-window-functions-a-practical-guide-with-an-e-commerce-dataset-ejg"&gt;https://dev.to/charles_ndungu/mastering-sql-joins-and-window-functions-a-practical-guide-with-an-e-commerce-dataset-ejg&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Charles-Ndungu/excel-for-data-analytics" rel="noopener noreferrer"&gt;https://github.com/Charles-Ndungu/excel-for-data-analytics&lt;/a&gt;&lt;/p&gt;

</description>
      <category>dashboard</category>
      <category>datamodeling</category>
      <category>dax</category>
      <category>powerbi</category>
    </item>
    <item>
      <title>The Backbone of Power BI: A Deep Dive into Data Modeling &amp; Schemas</title>
      <dc:creator>Charles</dc:creator>
      <pubDate>Sat, 31 Jan 2026 15:38:07 +0000</pubDate>
      <link>https://dev.to/charles_ndungu/the-backbone-of-power-bi-a-deep-dive-into-data-modeling-schemas-1o1l</link>
      <guid>https://dev.to/charles_ndungu/the-backbone-of-power-bi-a-deep-dive-into-data-modeling-schemas-1o1l</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk5695mkr6ipr26fvak45.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk5695mkr6ipr26fvak45.png" alt="DATA R/SHIP" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://images.unsplash.com/photo-1551288049-bebda4e38f71?ixlib=rb-4.0.3&amp;amp;auto=format&amp;amp;fit=crop&amp;amp;w=1200&amp;amp;q=80" rel="noopener noreferrer"&gt;https://images.unsplash.com/photo-1551288049-bebda4e38f71?ixlib=rb-4.0.3&amp;amp;auto=format&amp;amp;fit=crop&amp;amp;w=1200&amp;amp;q=80&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;Power BI is a powerful tool for business intelligence, enabling users to connect to various data sources, visualize data, and share insights across an organization. At the heart of any Power BI project is the data model, a crucial component that defines how data is organized, related, and utilized for analysis. Understanding data models is key to unlocking the full potential of Power BI. &lt;/p&gt;

&lt;p&gt;This article explains what data models are, why they matter in Power BI, and how to design clean, fast, and correct models using the star schema (recommended) and the snowflake schema (when appropriate). &lt;/p&gt;

&lt;h3&gt;
  
  
  What is a Data Model?
&lt;/h3&gt;

&lt;p&gt;A data model in Power BI is a collection of tables, relationships, and calculations that represent the underlying structure of your data. It defines how data is stored, how different data entities relate to each other, and how calculations (such as measures and calculated columns) are performed. The data model serves as the foundation for creating reports and dashboards, enabling you to analyze and visualize your data in meaningful ways.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Components of a Data Model
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Tables: Fact vs. Dimension - The Critical Distinction&lt;/strong&gt;&lt;br&gt;
In Power BI, tables are the basic building blocks, but understanding their purpose is crucial:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fact Tables&lt;/strong&gt; (The "What Happened" Tables):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Contain measurable, quantitative data (metrics/KPIs)&lt;/li&gt;
&lt;li&gt;Examples: Sales amounts, quantities, counts, durations&lt;/li&gt;
&lt;li&gt;Typically have many rows (millions/billions)&lt;/li&gt;
&lt;li&gt;Store foreign keys that link to dimension tables
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-- Example Fact Table Structure
Sales_Fact
----------
SalesKey (Primary Key)
DateKey (Foreign Key to Date dimension)
ProductKey (Foreign Key to Product dimension)
CustomerKey (Foreign Key to Customer dimension)
SalesAmount (measure)
Quantity (measure)
Profit (measure)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Dimension Tables (The "Context" Tables):
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Contain descriptive attributes and categories&lt;/li&gt;
&lt;li&gt;Examples: Product details, customer demographics, date hierarchies&lt;/li&gt;
&lt;li&gt;Typically have fewer rows (thousands)&lt;/li&gt;
&lt;li&gt;Provide context for analysis (the "who, what, when, where")
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-- Example Dimension Table Structure
Product_Dim
-----------
ProductKey (Primary Key)
ProductName
Category
Brand
Color
Size
PriceRange
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Relationships: The Connective Tissue
&lt;/h3&gt;

&lt;p&gt;Relationships define how tables are connected in a data model. They establish links between columns in different tables, allowing you to perform cross-table calculations and create complex data visualizations. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Power BI supports different relationship types:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One-to-many (1:*) - Most common (dimension → fact table)&lt;/li&gt;
&lt;li&gt;Many-to-one (*:1) - Reverse of above&lt;/li&gt;
&lt;li&gt;One-to-one (1:1) - Rare, often indicates design issues&lt;/li&gt;
&lt;li&gt;Many-to-many ()** - Requires careful handling with bridge tables&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Measures and Calculated Columns&lt;/strong&gt;&lt;br&gt;
Measures are calculations used to aggregate data in a model, created using DAX (Data Analysis Expressions). They are computed at query time and don't store data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Total Sales = SUM(Sales[SalesAmount])
Average Price = AVERAGE(Products[Price])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Calculated Columns&lt;/strong&gt; are custom columns created in your tables using DAX. They store data at the row level and are computed during data refresh:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Profit Margin = 
DIVIDE(Sales[Profit],Sales[SalesAmount],0)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Hierarchies
&lt;/h3&gt;

&lt;p&gt;Hierarchies organize data into multiple levels, making it easier to drill down and analyze data at different granularities. For example, a date hierarchy might include levels for year, quarter, month, and day.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Modeling Schemas: Star vs. Snowflake
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Star Schema: Power BI's Gold Standard&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4nc5rfvip55vq8yblk6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk4nc5rfvip55vq8yblk6.png" alt="Star Schema" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;star schema&lt;/strong&gt; is the most recommended structure for Power BI because of its simplicity and performance benefits. It's called a "star" because when visualized, it resembles a star with a central fact table connected to surrounding dimension tables.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2z5qb99vn8ny2ivsqw70.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2z5qb99vn8ny2ivsqw70.png" alt="Star Schema" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Star Schema Excels in Power BI:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Performance:&lt;/strong&gt; Simple relationships mean faster query execution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DAX Simplicity:&lt;/strong&gt; Clear context transition for calculations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User-Friendly:&lt;/strong&gt; Intuitive for report consumers to understand&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimized Storage:&lt;/strong&gt; Columnstore indexing works optimally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Snowflake Schema: When Normalization Matters&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftfr33am35znayq6y15dj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftfr33am35znayq6y15dj.png" alt="Snowflake" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A snowflake schema is a normalized version of the star schema, where dimension tables are broken down into multiple related tables. This creates a more complex, "snowflake" pattern.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6333su2kxfrkwdkbbdye.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6333su2kxfrkwdkbbdye.png" alt="Snowflake" width="800" height="559"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When to Consider Snowflake Schema:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your source data is already heavily normalized&lt;/li&gt;
&lt;li&gt;You need to minimize data redundancy for storage efficiency&lt;/li&gt;
&lt;li&gt;Complex dimensions with multiple hierarchical levels&lt;/li&gt;
&lt;li&gt;Integration with existing normalized databases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Trade-off:&lt;/strong&gt; While snowflake schemas reduce data redundancy, they introduce complexity that can impact Power BI performance. More tables mean more relationships, which can slow down calculations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pro Tip:&lt;/strong&gt; In Power BI, you can often "flatten" snowflaked dimensions back into a star schema using Power Query transformations, giving you the best of both worlds.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F03vbxan10bxvyt5quala.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F03vbxan10bxvyt5quala.png" alt="Star vs Snowflake" width="800" height="560"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Why Data Models Matter in Power BI
&lt;/h3&gt;

&lt;p&gt;A well-designed data model is essential for effective data analysis and reporting. Here are a few reasons why data models are important:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Performance: The Speed Difference Is Dramatic&lt;/strong&gt;&lt;br&gt;
Good modeling can make reports 10-100x faster. Here's why:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-- With star schema (optimized)
Total Sales = SUM(Sales[Amount])
-- Simple, fast filter propagation

-- With poor modeling (slow)
Total Sales = CALCULATE(SUM(Transactions[Value]),CROSSFILTER(Products[ID],Transactions[ProdID], BOTH),USERELATIONSHIP(Dates[Date], Transactions[TransactionDate]))
-- Multiple cross-filters = performance penalty
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real Impact:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Star schema report: 3-second load time&lt;/li&gt;
&lt;li&gt;Poorly modeled equivalent: 30-second load time&lt;/li&gt;
&lt;li&gt;User adoption difference: 80% vs. 20%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Accuracy: Trustworthy Numbers or Guesswork?&lt;/strong&gt;&lt;br&gt;
Bad modeling leads to wrong numbers through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fan traps: Many-to-many relationships without proper bridging&lt;/li&gt;
&lt;li&gt;Chasm traps: Missing relationships causing undercounting&lt;/li&gt;
&lt;li&gt;Ambiguous contexts: Multiple active paths between tables&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example: Without proper date dimension relationships, time intelligence functions like TOTALYTD() or SAMEPERIODLASTYEAR() will return incorrect results.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scalability: Future-Proofing Your Solutions&lt;/strong&gt;&lt;br&gt;
As your data grows from thousands to millions of rows, a well-structured model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Handles increased data volume gracefully&lt;/li&gt;
&lt;li&gt;Maintains consistent performance&lt;/li&gt;
&lt;li&gt;Allows easy addition of new data sources&lt;/li&gt;
&lt;li&gt;Supports row-level security implementation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Maintainability: Reducing Technical Debt&lt;/strong&gt;&lt;br&gt;
A clean model is easier to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Update with new requirements&lt;/li&gt;
&lt;li&gt;Troubleshoot when issues arise&lt;/li&gt;
&lt;li&gt;Hand over to other developers&lt;/li&gt;
&lt;li&gt;Document and understand months later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F648ws9wwpos2xlo5tloq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F648ws9wwpos2xlo5tloq.png" alt="Star vs Snowflake" width="770" height="834"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Creating a Data Model in Power BI
&lt;/h3&gt;

&lt;p&gt;Creating a data model in Power BI involves several steps, from importing data to defining relationships and calculations. Let's walk through the process:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Import and Transform Data&lt;/strong&gt;&lt;br&gt;
Start by importing data from your sources. Power BI supports a wide range of data connectors. Once imported, use Power Query Editor to clean and transform your data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Transformation for Modeling:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a proper Date table (never use fact table dates directly)&lt;/li&gt;
&lt;li&gt;Flatten normalized structures into star schema when possible&lt;/li&gt;
&lt;li&gt;Ensure consistent data types across related columns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Define Relationships and Choose Schema&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After loading your data, define relationships between tables. Remember:&lt;/li&gt;
&lt;li&gt;Aim for star schema whenever possible&lt;/li&gt;
&lt;li&gt;Set proper cross-filter direction (usually single direction)&lt;/li&gt;
&lt;li&gt;Use integer keys for relationships (faster than text)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Create Measures and Calculated Columns&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;With your schema established, create business logic:&lt;/li&gt;
&lt;li&gt;Measures for aggregate calculations (sums, averages, ratios)&lt;/li&gt;
&lt;li&gt;Calculated columns for row-level categorization&lt;/li&gt;
&lt;li&gt;Time intelligence measures using date dimension&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Organize and Document Your Model&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Group related tables into folders&lt;/li&gt;
&lt;li&gt;Use clear, consistent naming conventions&lt;/li&gt;
&lt;li&gt;Hide unnecessary fields from report view&lt;/li&gt;
&lt;li&gt;Add descriptions to tables and columns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;5. Optimize for Performance&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remove unnecessary columns&lt;/li&gt;
&lt;li&gt;Reduce cardinality where possible&lt;/li&gt;
&lt;li&gt;Optimize DAX calculations&lt;/li&gt;
&lt;li&gt;Use Performance Analyzer to identify bottlenecks&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Best Practices for Data Modeling in Power BI
&lt;/h3&gt;

&lt;p&gt;To ensure that your data model is robust and efficient, consider the following best practices:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Always Start with Star Schema Design&lt;/strong&gt;&lt;br&gt;
Plan your model with star schema as the default choice. Deviate only when you have specific normalization requirements that justify the complexity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Implement a Proper Date Dimension&lt;/strong&gt;&lt;br&gt;
Every fact table needs a dedicated date dimension table with continuous dates and time intelligence columns.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Create comprehensive Date table
Date = ADDCOLUMNS(CALENDAR(DATE(2020,1,1), DATE(2025,12,31)),"Year", YEAR([Date]),"Quarter", "Q" &amp;amp; FORMAT([Date], "Q"),"Month", FORMAT([Date], "MMM"),"Weekday", FORMAT([Date], "dddd"),"IsWeekend", IF(WEEKDAY([Date],2) &amp;gt; 5, TRUE, FALSE))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Use Integer Keys for Relationships&lt;/strong&gt;&lt;br&gt;
Integer keys (especially surrogate keys) perform better than text keys in relationships.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Avoid Circular References and Both-Direction Filtering&lt;/strong&gt;&lt;br&gt;
Circular relationships and both-direction filtering can cause performance issues and incorrect results. Use single-direction filtering as the default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Prefer Measures Over Calculated Columns&lt;/strong&gt;&lt;br&gt;
Measures are more flexible and don't increase model size. Use calculated columns only when you need row-level categorization for filtering or grouping.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Implement Row-Level Security Early&lt;/strong&gt;&lt;br&gt;
Design your model with security in mind from the beginning, not as an afterthought.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. Regularly Test and Validate&lt;/strong&gt;&lt;br&gt;
Test with sample data before full implementation&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Validate calculations against source systems&lt;/li&gt;
&lt;li&gt;Use Performance Analyzer to identify bottlenecks&lt;/li&gt;
&lt;li&gt;Get user feedback on report responsiveness&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion: The Foundation of Power BI Success
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Data models&lt;/strong&gt; are indeed the backbone of any Power BI project. They provide the structure and logic needed to transform raw data into actionable insights. By understanding the critical distinction between fact and dimension tables, implementing the appropriate star or snowflake schema, and following proven best practices, you create a foundation that ensures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Blazing Performance:&lt;/strong&gt; Reports that load in seconds, not minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unquestionable Accuracy:&lt;/strong&gt; Numbers stakeholders can trust&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Effortless Scalability:&lt;/strong&gt; Models that grow with your business&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sustainable Maintenance:&lt;/strong&gt; Solutions that don't become technical debt&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The choice between star and snowflake schema isn't just academic—it's a practical decision with real consequences for report performance and user experience. While star schema is generally the better choice for Power BI, understanding both approaches allows you to make informed decisions based on your specific requirements.&lt;/p&gt;

&lt;p&gt;Remember: In Power BI, &lt;strong&gt;the model is the report&lt;/strong&gt;. Beautiful visualizations cannot compensate for a broken foundation. Invest time in proper data modeling, and every report you build will be faster, more accurate, and more maintainable.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;u&gt;&lt;strong&gt;&lt;em&gt;Data Analysis Step by Step;&lt;/em&gt;&lt;/strong&gt;&lt;/u&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;1st Read: Git &amp;amp; Github Beginner's guide&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you’re also learning version control with Git, you can read my Git &amp;amp; GitHub beginner’s guide here: &lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/git-for-data-scientists-data-engineers-my-very-first-beginner-guide-git-bash-github-3952"&gt;https://dev.to/charles_ndungu/git-for-data-scientists-data-engineers-my-very-first-beginner-guide-git-bash-github-3952&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;2nd Read: Mastering Excel&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After mastering Git basics, you can learn how to analyze data using Microsoft Excel here:&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/ms-excel-for-data-analytics-a-friendly-practical-guide-for-beginners-hjn"&gt;https://dev.to/charles_ndungu/ms-excel-for-data-analytics-a-friendly-practical-guide-for-beginners-hjn&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;3rd Read: Data Modelling &amp;amp; Schemas&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This article dives into data modelling in Power BI, covering star and snowflake schemas, fact and dimension tables, relationships, and why good modelling is essential for accurate insights and fast reports.&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/the-backbone-of-power-bi-a-deep-dive-into-data-modeling-schemas-1o1l"&gt;https://dev.to/charles_ndungu/the-backbone-of-power-bi-a-deep-dive-into-data-modeling-schemas-1o1l&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;4th Read: Data Analysis Steps in Power BI&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This article reveals how Power BI analysts act as data translators, bridging the gap between messy data and clear business action. We break down their essential three-step process: cleaning raw information, encoding logic with DAX, and designing dashboards that drive real decisions.&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/from-raw-data-to-real-action-the-analysts-journey-as-a-data-translator-in-power-bi-2gl6"&gt;https://dev.to/charles_ndungu/from-raw-data-to-real-action-the-analysts-journey-as-a-data-translator-in-power-bi-2gl6&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;5th Read: From Tables to Insights – A SQL Masterclass&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data lives scattered across tables—customers here, products there, sales everywhere. This article is your guide to bringing it all together. Follow along with a real online store dataset as we build queries that rank top customers, spot sales trends, and uncover hidden patterns. By the end, you won't just write SQL—you'll tell stories with data.&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/mastering-sql-joins-and-window-functions-a-practical-guide-with-an-e-commerce-dataset-ejg"&gt;https://dev.to/charles_ndungu/mastering-sql-joins-and-window-functions-a-practical-guide-with-an-e-commerce-dataset-ejg&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Charles-Ndungu/excel-for-data-analytics" rel="noopener noreferrer"&gt;https://github.com/Charles-Ndungu/excel-for-data-analytics&lt;/a&gt;&lt;/p&gt;

</description>
      <category>powerbi</category>
      <category>datamodeling</category>
      <category>starschema</category>
      <category>dataanalytics</category>
    </item>
    <item>
      <title>MS Excel for Data Analytics — A friendly, practical guide for beginners</title>
      <dc:creator>Charles</dc:creator>
      <pubDate>Sun, 25 Jan 2026 15:44:24 +0000</pubDate>
      <link>https://dev.to/charles_ndungu/ms-excel-for-data-analytics-a-friendly-practical-guide-for-beginners-hjn</link>
      <guid>https://dev.to/charles_ndungu/ms-excel-for-data-analytics-a-friendly-practical-guide-for-beginners-hjn</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpzmxa32ijqmg9afd1zpt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpzmxa32ijqmg9afd1zpt.png" alt="Excel" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Microsoft Excel&lt;/strong&gt; is often the first tool people use when they start learning &lt;strong&gt;data analytics&lt;/strong&gt; — and for good reason. It is easy to use, widely available, and powerful enough to turn raw data into useful insights.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At its core, data analytics is about answering questions using data. For example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;How much did we sell?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Which product performed best?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Are sales increasing or decreasing over time?&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Excel helps answer these questions by allowing you to organize data in tables, perform simple calculations, summarize large datasets, and create visual charts — all in one place.&lt;/p&gt;

&lt;p&gt;This article is a &lt;strong&gt;step-by-step introduction&lt;/strong&gt; to using Microsoft Excel for basic data analysis. No prior experience is required. We will start with simple concepts and gradually build up, showing how Excel can be used to clean data, calculate key values, and create clear summaries that support decision-making.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F74wx7c0yo49dy45wgaoy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F74wx7c0yo49dy45wgaoy.png" alt="Excel Data Projection" width="752" height="465"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Data Analytics?
&lt;/h2&gt;

&lt;p&gt;Data analytics is simply the process of looking at data to find meaning, patterns, and answers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In everyday life, data analytics helps answer questions like:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How much money did we make this month?&lt;/li&gt;
&lt;li&gt;Which product is selling the most?&lt;/li&gt;
&lt;li&gt;Are sales improving or declining over time?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The “data” can be anything — sales records, survey responses, attendance lists, or expenses. On its own, raw data is often messy and hard to understand. Data analytics helps turn that raw data into &lt;strong&gt;useful information.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Microsoft Excel is one of the easiest tools to use for data analytics. With Excel, you can:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Organize data into rows and columns&lt;/li&gt;
&lt;li&gt;Perform calculations automatically&lt;/li&gt;
&lt;li&gt;Clean and fix messy data&lt;/li&gt;
&lt;li&gt;Summarize large datasets&lt;/li&gt;
&lt;li&gt;Create charts to visualize trends&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why Excel is commonly used as a starting point before learning more advanced tools like Power BI, SQL, or Python.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8js4ftvwm8r8iz798ean.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8js4ftvwm8r8iz798ean.png" alt="Sales Data February" width="608" height="258"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting started with Excel (Rows, Columns &amp;amp; Cells — super simple)
&lt;/h2&gt;

&lt;p&gt;Excel stores information in a big grid. The grid has rows, columns, and cells — that’s it. Think of a spreadsheet like a paper chart: each row is a single item (one sale, one product, one quantity), each column is a type of information about that item (product, quantity, price), and each cell is the box where one piece of information sits (for example: the price of one sale).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why structure matters: good data analysis needs neat, predictable data. That means:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One header row at the top (column names like Date, Product, Quantity)&lt;/li&gt;
&lt;li&gt;No blank rows inside your data block&lt;/li&gt;
&lt;li&gt;No mixed data in a column (don’t put text and numbers in the same column)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When your data follows those simple rules, Excel can easily:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sort and filter it,&lt;/li&gt;
&lt;li&gt;Run basic calculations,&lt;/li&gt;
&lt;li&gt;Summarize it with PivotTables,&lt;/li&gt;
&lt;li&gt;And make charts that actually tell a story.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Quick tip:&lt;/strong&gt; Select your data and press Ctrl + T (or Insert → Table). Turning your range into a Table makes formulas, filters, and charts behave more reliably — and it’s the one small habit that makes Excel feel professional fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Excel looks like a big grid.&lt;/strong&gt; That’s it. Behind the grid you’ll do all your data work.&lt;/p&gt;

&lt;p&gt;Think of the grid like a paper ledger or a notebook:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Row&lt;/strong&gt; — a horizontal line (1, 2, 3…).
Each row usually represents &lt;strong&gt;one record&lt;/strong&gt; — for example, one product’s sales in our February table.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Column&lt;/strong&gt; — a vertical line (A, B, C…).
Each column stores &lt;strong&gt;one type of information&lt;/strong&gt; — for example: Product, Unit Price, Quantity, Total.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cell&lt;/strong&gt; — the single box where a row and column cross (A1, B2, C3).
A cell holds &lt;strong&gt;one piece of information&lt;/strong&gt;: &lt;strong&gt;text&lt;/strong&gt; (e.g., Microwave), a number (e.g., 160), a date, or a formula.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo0d47cfy8317xg9sqjdv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo0d47cfy8317xg9sqjdv.png" alt="Columns, Rows &amp;amp; Cells" width="520" height="367"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this simple structure matters for analysis
&lt;/h2&gt;

&lt;p&gt;When your data is tidy — one header row, no blank rows, one type of value per column — Excel can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sort and filter quickly,&lt;/li&gt;
&lt;li&gt;Run formulas that fill down correctly,&lt;/li&gt;
&lt;li&gt;Summarize with PivotTables,&lt;/li&gt;
&lt;li&gt;And build charts that actually make sense.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your data is not tidy (mixed values, empty headers, blank rows) Excel still works — but you’ll waste time untangling problems. Good structure = faster answers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Real example
&lt;/h3&gt;

&lt;p&gt;Look at the row for Air Conditioner:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Product&lt;/th&gt;
&lt;th&gt;Unit Price&lt;/th&gt;
&lt;th&gt;Quantity&lt;/th&gt;
&lt;th&gt;Total&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Air Conditioner&lt;/td&gt;
&lt;td&gt;850&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;td&gt;34,000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Here:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;One &lt;strong&gt;row&lt;/strong&gt; Air Conditioner record.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;Unit Price&lt;/strong&gt; column always has prices.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;Total&lt;/strong&gt; cell is the &lt;strong&gt;product of Unit Price × Quantity&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In Excel you would put a formula like =C2*D2 in the Total cell and then &lt;strong&gt;drag the fill handle&lt;/strong&gt; down to copy that formula for every row. Excel updates cell references automatically — that’s called &lt;strong&gt;relative referencing&lt;/strong&gt; and it makes repetitive calculations easy.&lt;/p&gt;

&lt;h2&gt;
  
  
  One tiny pro tip for beginners
&lt;/h2&gt;

&lt;p&gt;Select your full range and press Ctrl + T (or Insert → Table).&lt;br&gt;
Turning your range into an Excel Table:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Adds filter arrows automatically,&lt;/li&gt;
&lt;li&gt;Makes formulas easier ([&lt;a class="mentioned-user" href="https://dev.to/unit"&gt;@unit&lt;/a&gt; Price] style),&lt;/li&gt;
&lt;li&gt;Auto-fills formulas for new rows,&lt;/li&gt;
&lt;li&gt;Keeps charts and pivots stable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This small habit makes your workbook feel professional instantly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick practice (30 seconds)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Click any cell with a number and type =ISNUMBER(A2) in a blank cell — it returns TRUE if A2 is a real number.&lt;/li&gt;
&lt;li&gt;Put =C2*D2 in the first &lt;strong&gt;Total&lt;/strong&gt; cell, press Enter, then drag the fill handle down. See the totals fill? You did calculation automation.&lt;/li&gt;
&lt;li&gt;Select the range and press &lt;strong&gt;Ctrl + T&lt;/strong&gt; to make it a table.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1t7c7l545ipotgcckvq7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1t7c7l545ipotgcckvq7.png" alt="Figure: Rows, columns, and a cell formula (Total = Unit Price × Quantity)." width="532" height="356"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Hands-on: Basic calculations, quick cleaning, and your first PivotTable
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;We’ll do three small things in order:&lt;/li&gt;
&lt;li&gt;Add simple formulas (Total, SUM, AVERAGE, COUNT, IF)&lt;/li&gt;
&lt;li&gt;Quick data-cleaning checks &lt;/li&gt;
&lt;li&gt;Build your first PivotTable to summarize the table&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Add simple formulas/Functions (Total per row + a running summary)
&lt;/h3&gt;

&lt;h3&gt;
  
  
  A — Total per row (you already saw this idea):
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Click the Total cell for the first product (e.g., cell E2 if your table columns are A:SlNo, B:Product, C:Unit Price, D:Quantity, E:Total).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Type =[&lt;a class="mentioned-user" href="https://dev.to/unit"&gt;@unit&lt;/a&gt; Price] * [@Quantity] or if you prefer cell references (example for row 2):=C2*D2&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Press Enter. If your data is an Excel Table, Excel auto-fills the formula for all rows.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzll281n76v56enjlunqa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzll281n76v56enjlunqa.png" alt="Showing Table Formula" width="586" height="304"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  B — Grand Total (SUM)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Below the table, pick a blank cell and type:&lt;strong&gt;=SUM(SalesData[Total])&lt;/strong&gt;&lt;br&gt;
or, if not a table:&lt;strong&gt;=SUM(E2:E7)&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Press Enter → this gives total sales for February.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftql3fa8b78fetdeaey1s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftql3fa8b78fetdeaey1s.png" alt="Grand Total" width="543" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  C — Average &amp;amp; Count
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Average Unit Price:&lt;strong&gt;=AVERAGE(SalesData[Unit Price])&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frtsc962x86dzxr9z1njf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frtsc962x86dzxr9z1njf.png" alt="AVERAGE" width="536" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Number of product rows: &lt;strong&gt;=COUNTA(SalesData[Product])&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9862olm86acla2915mpw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9862olm86acla2915mpw.png" alt="COUNTA" width="587" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  D — Simple IF check (flag low-selling items)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Add a new column LowSales and use:&lt;strong&gt;=IF([&lt;a class="mentioned-user" href="https://dev.to/total"&gt;@total&lt;/a&gt;] &amp;lt; 10000, "Low", "OK")&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs2nj9jagi68ssjxxkhqv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs2nj9jagi68ssjxxkhqv.png" alt="If Function" width="682" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This flags products with Total &amp;lt; 10,000 for review.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tiny practice for easier understanding;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Change the Quantity for Television to 100 and watch the Grand Total update automatically. You just saw reactive calculation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1311jxeelypf0hr8kjg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1311jxeelypf0hr8kjg.png" alt="Function Hacks" width="737" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick data-cleaning checks
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;These checks help with the following;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Removing Duplicates and Irrelevant Observations: Identifying and deleting repeated records or data points that do not fit the specific problem you are trying to solve, which makes analysis more efficient.&lt;/li&gt;
&lt;li&gt;Fixing Structural Errors: Standardizing inconsistent formats (e.g., converting "st." to "Street") and ensuring uniform naming conventions across the dataset.&lt;/li&gt;
&lt;li&gt;Handling Missing Data: Addressing gaps in the data through methods like imputation (filling in values) or deletion to maintain the integrity of the dataset.&lt;/li&gt;
&lt;li&gt;Managing Outliers: Detecting and handling extreme values that could skew results and lead to inaccurate conclusions.&lt;/li&gt;
&lt;li&gt;Ensuring Data Consistency and Formatting: Scaling or normalizing values to a specific range (e.g., 0 to 1) so that algorithms can process them more effectively.&lt;/li&gt;
&lt;li&gt;Validating Accuracy: Checking for human errors such as typos, misspelled names, or incorrect numerical values against known lists or standards.&lt;/li&gt;
&lt;li&gt;Enhancing Data Compliance: Maintaining accurate and valid records to minimize legal risks and ensure compliance with data regulations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;1) Are numeric cells real numbers?&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;In a blank cell type: &lt;strong&gt;=ISNUMBER(C2)&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Replace C2 with a Unit Price cell. TRUE = good.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3hzwk41cuyhogys15fhs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3hzwk41cuyhogys15fhs.png" alt="Numeric Conf" width="682" height="374"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2) Trim spaces&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If Product names look odd, create helper:&lt;strong&gt;=TRIM([@Product])&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F749i3motb9zh71c8rwtw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F749i3motb9zh71c8rwtw.png" alt="Trimming" width="714" height="370"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Copy the trimmed values → Paste Values → replace original Product column.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3) Remove exact duplicates&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Select the table → Data → Remove Duplicates → check all columns → OK.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh71grmyb812pkjfsrc5j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh71grmyb812pkjfsrc5j.png" alt="Removing Duplicate" width="664" height="373"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4) Find blanks quickly&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Select ALL &lt;strong&gt;Ctrl + A&lt;/strong&gt;, Press &lt;strong&gt;Ctrl + F&lt;/strong&gt; → Leave search blank? (Instead use filter): click any column filter → (Blanks) to show missing values.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fycxx9yz02bak9jiz89ip.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fycxx9yz02bak9jiz89ip.png" alt="Finding Blanks" width="609" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Build your first PivotTable (summarize Total by Product)
&lt;/h2&gt;

&lt;p&gt;Pivot Tables help you summarize large data quickly without complex formulas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You can answer questions like:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total sales per product&lt;/li&gt;
&lt;li&gt;Sales per month&lt;/li&gt;
&lt;li&gt;Quantity sold by category&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Functions&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Summarization:&lt;/strong&gt; Automatically calculates sums, averages, counts, or other statistics without requiring manual formulas.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Grouping:&lt;/strong&gt; Organizes information into categories, such as sales by region, employee, or time period.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trend Identification:&lt;/strong&gt; Helps spot patterns, outliers, and relationships within complex datasets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interactive Exploration:&lt;/strong&gt; Uses a drag-and-drop interface to rearrange fields, apply filters, and drill down into specific details instantly. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Four Main Components&lt;/strong&gt;&lt;br&gt;
When building a pivot table, you typically assign your data fields to four areas: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rows:&lt;/strong&gt; Fields placed here appear as labels down the left side of the table.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Columns:&lt;/strong&gt; Fields placed here appear as headers across the top.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Values:&lt;/strong&gt; This is where numerical data is calculated (e.g., "Sum of Sales" or "Count of Orders").&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filters:&lt;/strong&gt; Allows you to isolate specific subsets of data, such as a single year or a specific department.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Steps (exact Hacks)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Click anywhere inside your SalesData table.&lt;/li&gt;
&lt;li&gt;Go to Insert → PivotTable. Excel will suggest the table range — leave defaults and choose New Worksheet, then OK.&lt;/li&gt;
&lt;li&gt;In the PivotTable Fields pane:&lt;/li&gt;
&lt;li&gt;Drag Product into Rows.&lt;/li&gt;
&lt;li&gt;Drag Total into Values (it defaults to Sum).&lt;/li&gt;
&lt;li&gt;The Pivot shows Sum of Total by Product — great!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Drag LowSales into Filters to quickly filter only low sellers.&lt;/p&gt;

&lt;h2&gt;
  
  
  PivotField Layout
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhght2c3p8t005xjpif9o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhght2c3p8t005xjpif9o.png" alt="PivotField Layout" width="262" height="719"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Pivot Results
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frph9qnhrulc9zbdhuj15.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frph9qnhrulc9zbdhuj15.png" alt="Pivot Results" width="224" height="232"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Charts in Excel
&lt;/h2&gt;

&lt;p&gt;In Microsoft Excel, &lt;strong&gt;charts are visual representations used to make complex data easier to interpret.&lt;/strong&gt; They allow you to illustrate trends, comparisons, and relationships between different data series graphically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Chart Types
&lt;/h3&gt;

&lt;p&gt;Excel offers several standard chart types, each suited for different data analysis goals: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Column Charts: Best for comparing items across categories using vertical bars.&lt;/li&gt;
&lt;li&gt;Line Charts: Ideal for showing trends over time or continuous data.&lt;/li&gt;
&lt;li&gt;Pie Charts: Used to show proportions and how parts contribute to a whole for a single data series.&lt;/li&gt;
&lt;li&gt;Bar Charts: Function like column charts but use horizontal bars, making them better for long category labels.&lt;/li&gt;
&lt;li&gt;Scatter (XY) Charts: Used to show relationships between sets of values, common in scientific and engineering data.&lt;/li&gt;
&lt;li&gt;Combo Charts: Combine two or more chart types (e.g., a column and a line chart) to highlight different types of information in one view.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Make a quick PivotChart
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;With the PivotTable selected, click PivotTable Analyze → PivotChart → choose Column.&lt;/li&gt;
&lt;li&gt;Place the chart on the Pivot sheet or copy it to a Dashboard sheet.&lt;/li&gt;
&lt;li&gt;Add a clear title: Revenue by Product — February.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fabjkm6sy5zl81p7hc4rk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fabjkm6sy5zl81p7hc4rk.png" alt="Pivot Visualization" width="537" height="309"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Short notes (best-practice tips)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Use Tables (Ctrl + T) — they auto-fill formulas and make Pivots easier.&lt;/li&gt;
&lt;li&gt;Avoid mixing data types in a column (numbers + text).&lt;/li&gt;
&lt;li&gt;Keep original raw data untouched — do cleaning in a copy or new sheet.&lt;/li&gt;
&lt;li&gt;When Pivot doesn’t update after changing data, click Refresh (PivotTable Analyze → Refresh).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Building a Simple Excel Dashboard (Charts, KPIs &amp;amp; Slicers)
&lt;/h2&gt;

&lt;p&gt;Excel dashboards are interactive, visual summaries of key data, using charts, tables, and KPIs in a single view for quick analysis and decision-making.&lt;/p&gt;

&lt;p&gt;In this section, we’ll turn our PivotTable into a clean, interactive dashboard that anyone can understand at a glance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;By the end, you’ll have:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A summary chart&lt;/li&gt;
&lt;li&gt;Clear KPI numbers&lt;/li&gt;
&lt;li&gt;An interactive slicer to filter results instantly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No advanced Excel. Just smart setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create a Dashboard Sheet
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Click the + icon to add a new worksheet&lt;/li&gt;
&lt;li&gt;Rename it to Dashboard&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This sheet will only contain visuals and summaries — no raw data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best practice:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Raw Data → one sheet&lt;/li&gt;
&lt;li&gt;PivotTables → one sheet&lt;/li&gt;
&lt;li&gt;Dashboard → one clean sheet&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Add Key KPI Numbers (Top Summary)
&lt;/h3&gt;

&lt;p&gt;From your PivotTable sheet, identify these values:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total Sales (February)&lt;/li&gt;
&lt;li&gt;Number of Products&lt;/li&gt;
&lt;li&gt;Average Sales per Product&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;A — Total Sales&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Copy the Grand Total value from your PivotTable&lt;/li&gt;
&lt;li&gt;Paste it at the top of the Dashboard&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Label it clearly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total Sales – February&lt;/li&gt;
&lt;li&gt;Increase font size, make it bold.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;B — Number of Products Sold&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;C — Average Revenue per Product&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use borders or shapes (Insert → Shapes → Rectangle) to make these KPIs look like cards.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Add a Slicer (Make It Interactive)
&lt;/h3&gt;

&lt;p&gt;This is where the dashboard comes alive.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click anywhere inside the PivotTable&lt;/li&gt;
&lt;li&gt;Go to PivotTable Analyze → Insert Slicer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Select:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Product&lt;/li&gt;
&lt;li&gt;(Optional) LowSales&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Click OK&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Now:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Drag the slicer onto the Dashboard&lt;/li&gt;
&lt;li&gt;Resize it neatly on the side&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Clicking a product instantly updates:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chart&lt;/li&gt;
&lt;li&gt;KPIs&lt;/li&gt;
&lt;li&gt;PivotTable values&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;This is real-time analysis.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Clean Up the Dashboard (Very Important)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Make it executive-friendly:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remove gridlines&lt;/li&gt;
&lt;li&gt;View → Uncheck Gridlines&lt;/li&gt;
&lt;li&gt;Align everything evenly&lt;/li&gt;
&lt;li&gt;Use consistent fonts&lt;/li&gt;
&lt;li&gt;Avoid clutter — less is more&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Optional polish:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Light background color for KPI cards&lt;/li&gt;
&lt;li&gt;Bold headers&lt;/li&gt;
&lt;li&gt;No unnecessary decimals&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Wrap-Up (What You Built)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Created a dedicated dashboard sheet&lt;/li&gt;
&lt;li&gt;Displayed KPI metrics clearly&lt;/li&gt;
&lt;li&gt;Visualized revenue by product&lt;/li&gt;
&lt;li&gt;Added slicers for instant filtering&lt;/li&gt;
&lt;li&gt;Designed a clean, professional layout&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;This is the foundation of business reporting in Excel.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  KPI cards
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2r8kpvywj054gnzwz4lf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2r8kpvywj054gnzwz4lf.png" alt="KPI CARD" width="570" height="72"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Chart on dashboard
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1rvx0epdy7rpupy5kv4c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1rvx0epdy7rpupy5kv4c.png" alt="Revenue by Product February" width="582" height="303"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Product slicer
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46ltrtrffc9crnzrwx2o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46ltrtrffc9crnzrwx2o.png" alt="Product Slicer" width="200" height="230"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Full dashboard view
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fslfmbjnf04ibxa1g09sy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fslfmbjnf04ibxa1g09sy.png" alt="Dashboard" width="792" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Top 3 Takeaways from This Article
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Excel Turns Raw Data into Meaningful Insights&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Excel is not just for typing numbers. It helps you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Organize messy data&lt;/li&gt;
&lt;li&gt;Perform calculations automatically&lt;/li&gt;
&lt;li&gt;Identify patterns and trends&lt;/li&gt;
&lt;li&gt;Make data-driven decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With just a few formulas, tables, and charts, raw data becomes information you can actually understand and use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You Don’t Need to Be an Expert to Analyze Data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You don’t need coding or advanced tools to start data analytics.&lt;br&gt;
By learning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Basic formulas (SUM, AVERAGE, IF)&lt;/li&gt;
&lt;li&gt;Data cleaning (sorting, filtering, removing duplicates)&lt;/li&gt;
&lt;li&gt;PivotTables and simple charts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You already have enough skills to analyze real-world data confidently as a beginner.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Excel Is the Perfect First Step into Data Analytics&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Excel builds the foundation for everything else.&lt;br&gt;
Once you understand Excel:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Power BI becomes easier&lt;/li&gt;
&lt;li&gt;SQL queries make more sense&lt;/li&gt;
&lt;li&gt;Python feels less intimidating&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Excel teaches you how to think with data, not just how to calculate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every data analyst starts somewhere. Mastering Excel is one of the smartest places to begin. Practice with small datasets, stay curious, and keep building — your analytics journey has officially started 🚀&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;u&gt;&lt;strong&gt;&lt;em&gt;Data Analysis Step by Step;&lt;/em&gt;&lt;/strong&gt;&lt;/u&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;1st Read: Git &amp;amp; Github Beginner's guide&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you’re also learning version control with Git, you can read my Git &amp;amp; GitHub beginner’s guide here: &lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/git-for-data-scientists-data-engineers-my-very-first-beginner-guide-git-bash-github-3952"&gt;https://dev.to/charles_ndungu/git-for-data-scientists-data-engineers-my-very-first-beginner-guide-git-bash-github-3952&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;2nd Read: Mastering Excel&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After mastering Git basics, you can learn how to analyze data using Microsoft Excel here:&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/ms-excel-for-data-analytics-a-friendly-practical-guide-for-beginners-hjn"&gt;https://dev.to/charles_ndungu/ms-excel-for-data-analytics-a-friendly-practical-guide-for-beginners-hjn&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;3rd Read: Data Modelling &amp;amp; Schemas&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This article dives into data modelling in Power BI, covering star and snowflake schemas, fact and dimension tables, relationships, and why good modelling is essential for accurate insights and fast reports.&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/the-backbone-of-power-bi-a-deep-dive-into-data-modeling-schemas-1o1l"&gt;https://dev.to/charles_ndungu/the-backbone-of-power-bi-a-deep-dive-into-data-modeling-schemas-1o1l&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;4th Read: Data Analysis Steps in Power BI&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This article reveals how Power BI analysts act as data translators, bridging the gap between messy data and clear business action. We break down their essential three-step process: cleaning raw information, encoding logic with DAX, and designing dashboards that drive real decisions.&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/from-raw-data-to-real-action-the-analysts-journey-as-a-data-translator-in-power-bi-2gl6"&gt;https://dev.to/charles_ndungu/from-raw-data-to-real-action-the-analysts-journey-as-a-data-translator-in-power-bi-2gl6&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;5th Read: From Tables to Insights – A SQL Masterclass&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data lives scattered across tables—customers here, products there, sales everywhere. This article is your guide to bringing it all together. Follow along with a real online store dataset as we build queries that rank top customers, spot sales trends, and uncover hidden patterns. By the end, you won't just write SQL—you'll tell stories with data.&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/mastering-sql-joins-and-window-functions-a-practical-guide-with-an-e-commerce-dataset-ejg"&gt;https://dev.to/charles_ndungu/mastering-sql-joins-and-window-functions-a-practical-guide-with-an-e-commerce-dataset-ejg&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Charles-Ndungu/excel-for-data-analytics" rel="noopener noreferrer"&gt;https://github.com/Charles-Ndungu/excel-for-data-analytics&lt;/a&gt;&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>beginners</category>
      <category>analytics</category>
      <category>dataanalytics</category>
    </item>
    <item>
      <title>Git for Data Analysts: My First Beginner Guide (Git Bash + GitHub Step-by-Step)</title>
      <dc:creator>Charles</dc:creator>
      <pubDate>Sat, 17 Jan 2026 13:49:25 +0000</pubDate>
      <link>https://dev.to/charles_ndungu/git-for-data-scientists-data-engineers-my-very-first-beginner-guide-git-bash-github-3952</link>
      <guid>https://dev.to/charles_ndungu/git-for-data-scientists-data-engineers-my-very-first-beginner-guide-git-bash-github-3952</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F47d0glr722aajocxypp1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F47d0glr722aajocxypp1.png" alt="Data Science &amp;amp; Data Engineers" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Hi, I’m &lt;strong&gt;Charles Ndungu&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I recently started learning &lt;strong&gt;data science&lt;/strong&gt; and &lt;strong&gt;analytics&lt;/strong&gt;, and one tool that kept appearing everywhere was Git. At first, Git felt intimidating — lots of commands, confusing terminology, and unclear workflows.&lt;/p&gt;

&lt;p&gt;So I decided to &lt;strong&gt;learn it step by step and document my experience.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This article walks through &lt;strong&gt;my very first Git workflow&lt;/strong&gt;, from creating a local project to pushing it to GitHub.&lt;/p&gt;

&lt;p&gt;If you're a &lt;strong&gt;beginner&lt;/strong&gt; in &lt;strong&gt;data science&lt;/strong&gt;, &lt;strong&gt;analytics&lt;/strong&gt;, or &lt;strong&gt;engineering&lt;/strong&gt;, this guide should help you get started.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practice repository&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/Charles-Ndungu/GIT-basics-practice" rel="noopener noreferrer"&gt;https://github.com/Charles-Ndungu/GIT-basics-practice&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Why Git Matters (Especially for Data Work)
&lt;/h3&gt;

&lt;p&gt;Even if you're mainly working with &lt;strong&gt;Python&lt;/strong&gt;, &lt;strong&gt;SQL&lt;/strong&gt;, &lt;strong&gt;Power BI&lt;/strong&gt;, or &lt;strong&gt;Excel&lt;/strong&gt;, Git quickly becomes essential.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here’s why:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Reproducibility&lt;/strong&gt; - You can return to any previous version of your project.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Collaboration&lt;/strong&gt; - Multiple people can work on the same project safely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Safety&lt;/strong&gt; - You can experiment without losing working code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Professional Standard&lt;/strong&gt; - Most modern data teams expect Git knowledge.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  The Simple Mental Model
&lt;/h3&gt;

&lt;p&gt;Before touching commands, this mental model helped me a lot.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Concept&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Meaning&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Local folder&lt;/td&gt;
&lt;td&gt;Your project on your computer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Git&lt;/td&gt;
&lt;td&gt;The system that tracks changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Commit&lt;/td&gt;
&lt;td&gt;A saved snapshot of your project&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remote&lt;/td&gt;
&lt;td&gt;The online copy (GitHub)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Push&lt;/td&gt;
&lt;td&gt;Upload your work&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pull&lt;/td&gt;
&lt;td&gt;Download updates&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The basic workflow is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;init → add → commit → push → pull
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you understand this cycle, Git becomes much easier.&lt;/p&gt;

&lt;h3&gt;
  
  
  What You’ll Need
&lt;/h3&gt;

&lt;p&gt;To follow along:&lt;br&gt;
• Windows computer (this guide uses Git Bash)&lt;br&gt;
• GitHub account&lt;br&gt;
&lt;a href="https://github.com" rel="noopener noreferrer"&gt;https://github.com&lt;/a&gt;&lt;br&gt;
• Git installed&lt;br&gt;
&lt;a href="https://git-scm.com/downloads" rel="noopener noreferrer"&gt;https://git-scm.com/downloads&lt;/a&gt;&lt;br&gt;
• (Optional) VS Code or another editor&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 1 — Install Git
&lt;/h3&gt;

&lt;p&gt;Download Git from: &lt;br&gt;
&lt;a href="https://git-scm.com/downloads" rel="noopener noreferrer"&gt;https://git-scm.com/downloads&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run the installer and accept the default settings.&lt;/li&gt;
&lt;li&gt;Once installed, open &lt;strong&gt;Git Bash&lt;/strong&gt; from the Start Menu.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Step 2 — Configure Git (One Time Only)
&lt;/h3&gt;

&lt;p&gt;Git needs your identity for commits.&lt;br&gt;
Open Git Bash and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git --version

git config --global user.name "Your Name"
git config --global user.email "you@example.com"

git config --list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This sets your global configuration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3 — Create Your First Local Git Project
&lt;/h3&gt;

&lt;p&gt;These are the exact commands I used.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# create a project folder
mkdir ~/git-basics-practice

# enter the folder
cd ~/git-basics-practice

# create a simple file
echo "hello from git bash" &amp;gt; hello.txt

# initialize git
git init

# check project status
git status

# stage the file
git add hello.txt

# create first commit
git commit -m "Add hello.txt"

# view history
git log --oneline
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, Git is &lt;strong&gt;tracking your project locally&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4 — Create a GitHub Repository
&lt;/h3&gt;

&lt;p&gt;Now we connect the project to GitHub.&lt;/p&gt;

&lt;p&gt;On GitHub:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click + → &lt;strong&gt;New Repository&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Repository name:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git-basics-practice
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Important:&lt;br&gt;
Do &lt;strong&gt;NOT&lt;/strong&gt; initialize with README or &lt;code&gt;.gitignore&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Copy the repository &lt;code&gt;URL&lt;/code&gt;.&lt;br&gt;
Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://github.com/your-username/git-basics-practice.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5 — Push the Project to GitHub
&lt;/h3&gt;

&lt;p&gt;Back in Git Bash:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git remote add origin https://github.com/your-username/git-basics-practice.git

git branch -M main

git push -u origin main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If successful, your project will now appear on GitHub.&lt;/p&gt;

&lt;h4&gt;
  
  
  Important Authentication Note
&lt;/h4&gt;

&lt;p&gt;GitHub &lt;strong&gt;no longer allows account passwords&lt;/strong&gt; when pushing from the command line.&lt;br&gt;
You must use either:&lt;/p&gt;

&lt;p&gt;• &lt;strong&gt;Personal Access Token (PAT)&lt;/strong&gt;&lt;br&gt;
• &lt;strong&gt;SSH authentication&lt;/strong&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Personal Access Token (PAT)
&lt;/h4&gt;

&lt;p&gt;Generate a token from GitHub and use it instead of your password.&lt;br&gt;
or&lt;/p&gt;
&lt;h4&gt;
  
  
  SSH Keys (Recommended)
&lt;/h4&gt;

&lt;p&gt;Generate an SSH key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh-keygen -t ed25519 -C "you@example.com"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then add the &lt;strong&gt;public key&lt;/strong&gt; to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GitHub → Settings → SSH and GPG keys
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 6 — Pulling Changes
&lt;/h3&gt;

&lt;p&gt;If someone updates the repository, or you edit directly on GitHub, download the changes using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git pull origin main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Git Command Cheat Sheet
&lt;/h3&gt;

&lt;p&gt;These commands are enough to start using Git comfortably.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git init                 start tracking a folder
git status               check project state
git add &amp;lt;file&amp;gt;           stage file
git commit -m "message"  save snapshot
git log --oneline        view commit history
git remote -v            show remote repositories
git push -u origin main  upload commits
git pull origin main     download changes
git checkout -b name     create new branch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  A Real Beginner Mistake I Made
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;When I first pushed my repository, Git said the push was successful.&lt;/li&gt;
&lt;li&gt;But when I opened the repo in the browser I got a &lt;strong&gt;404 error&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;After some debugging, I discovered the issue:&lt;/li&gt;
&lt;li&gt;My remote URL accidentally had an &lt;strong&gt;extra dot&lt;/strong&gt; at the end:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GIT-basics-practice..git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Git accepted the push, but the browser link didn’t match the repository name.&lt;br&gt;
If you encounter a &lt;strong&gt;404 error&lt;/strong&gt;, check your remote using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git remote -v
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Small typos in URLs can cause confusing errors.&lt;/p&gt;

&lt;h4&gt;
  
  
  Key Takeaways
&lt;/h4&gt;

&lt;p&gt;If you remember only three things about Git:&lt;br&gt;
• &lt;code&gt;git init&lt;/code&gt; starts tracking a project&lt;br&gt;
• &lt;code&gt;git commit&lt;/code&gt; saves a snapshot&lt;br&gt;
• &lt;code&gt;git push&lt;/code&gt; uploads your work to GitHub&lt;/p&gt;

&lt;p&gt;Once you understand this cycle, Git becomes much easier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Analysis Learning Path
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Git &amp;amp; Version Control  
2. Excel for Data Analysis  
3. Data Modeling  
4. Power BI Dashboards  
5. SQL for Data Analysis
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;u&gt;&lt;strong&gt;&lt;em&gt;Data Analysis Step by Step;&lt;/em&gt;&lt;/strong&gt;&lt;/u&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;1st Read: Git &amp;amp; Github Beginner's guide&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you’re also learning version control with Git, you can read my Git &amp;amp; GitHub beginner’s guide here: &lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/git-for-data-scientists-data-engineers-my-very-first-beginner-guide-git-bash-github-3952"&gt;https://dev.to/charles_ndungu/git-for-data-scientists-data-engineers-my-very-first-beginner-guide-git-bash-github-3952&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;2nd Read: Mastering Excel&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After mastering Git basics, you can learn how to analyze data using Microsoft Excel here:&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/ms-excel-for-data-analytics-a-friendly-practical-guide-for-beginners-hjn"&gt;https://dev.to/charles_ndungu/ms-excel-for-data-analytics-a-friendly-practical-guide-for-beginners-hjn&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;3rd Read: Data Modelling &amp;amp; Schemas&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This article dives into data modelling in Power BI, covering star and snowflake schemas, fact and dimension tables, relationships, and why good modelling is essential for accurate insights and fast reports.&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/the-backbone-of-power-bi-a-deep-dive-into-data-modeling-schemas-1o1l"&gt;https://dev.to/charles_ndungu/the-backbone-of-power-bi-a-deep-dive-into-data-modeling-schemas-1o1l&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;4th Read: Data Analysis Steps in Power BI&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This article reveals how Power BI analysts act as data translators, bridging the gap between messy data and clear business action. We break down their essential three-step process: cleaning raw information, encoding logic with DAX, and designing dashboards that drive real decisions.&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/from-raw-data-to-real-action-the-analysts-journey-as-a-data-translator-in-power-bi-2gl6"&gt;https://dev.to/charles_ndungu/from-raw-data-to-real-action-the-analysts-journey-as-a-data-translator-in-power-bi-2gl6&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;5th Read: From Tables to Insights – A SQL Masterclass&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data lives scattered across tables—customers here, products there, sales everywhere. This article is your guide to bringing it all together. Follow along with a real online store dataset as we build queries that rank top customers, spot sales trends, and uncover hidden patterns. By the end, you won't just write SQL—you'll tell stories with data.&lt;br&gt;
&lt;a href="https://dev.to/charles_ndungu/mastering-sql-joins-and-window-functions-a-practical-guide-with-an-e-commerce-dataset-ejg"&gt;https://dev.to/charles_ndungu/mastering-sql-joins-and-window-functions-a-practical-guide-with-an-e-commerce-dataset-ejg&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Charles-Ndungu/excel-for-data-analytics" rel="noopener noreferrer"&gt;https://github.com/Charles-Ndungu/excel-for-data-analytics&lt;/a&gt;&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>ai</category>
      <category>git</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
