<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Isika Millicent</title>
    <description>The latest articles on DEV Community by Isika Millicent (@isika_millicent).</description>
    <link>https://dev.to/isika_millicent</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3718312%2F7b946392-3db5-42a7-a3b4-787cf6752279.png</url>
      <title>DEV Community: Isika Millicent</title>
      <link>https://dev.to/isika_millicent</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/isika_millicent"/>
    <language>en</language>
    <item>
      <title>Git Demystified: How I Learned Git, Git Bash, GitHub, and Terminals as a Complete Beginner</title>
      <dc:creator>Isika Millicent</dc:creator>
      <pubDate>Wed, 11 Mar 2026 09:47:23 +0000</pubDate>
      <link>https://dev.to/isika_millicent/git-demystified-how-i-learned-git-git-bash-github-and-terminals-as-a-complete-beginner-264d</link>
      <guid>https://dev.to/isika_millicent/git-demystified-how-i-learned-git-git-bash-github-and-terminals-as-a-complete-beginner-264d</guid>
      <description>&lt;p&gt;When I first started learning version control, I kept seeing four things mentioned everywhere: Git, GitHub, Git Bash, and different terminals like Anaconda Prompt or the CMD&lt;/p&gt;

&lt;p&gt;And honestly, I was confused.&lt;/p&gt;

&lt;p&gt;Not the kind of confusion where you do not understand a command. The kind where you do not even understand &lt;em&gt;where&lt;/em&gt; you are supposed to type the command in the first place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the Confusion Started
&lt;/h2&gt;

&lt;p&gt;Let me share what actually happened.&lt;/p&gt;

&lt;p&gt;When I first started learning Git, I expected it to be straightforward.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install Git. &lt;/li&gt;
&lt;li&gt;Open Git Bash. &lt;/li&gt;
&lt;li&gt;Run the commands. &lt;/li&gt;
&lt;li&gt;Done&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Except Git Bash was not working properly on my machine.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I would open it and immediately get a red error message. Something about a child process and DLL rebasing. I had no idea what any of that meant. I tried running it as Administrator. Still broken. I searched online. Got more confused. Closed the terminal. Opened it again. Same error.&lt;/p&gt;

&lt;p&gt;I spent more time trying to fix Git Bash than I spent actually learning Git. That felt wrong, but I did not know what else to do.&lt;/p&gt;

&lt;p&gt;And here is what made it worse: Every single tutorial I found started with the same two words.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Open Git Bash."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So in my mind, the entire relationship looked like this: &lt;strong&gt;Git - Git Bash - GitHub&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Git Bash felt like the place where Git lived. Without it, I assumed I could not use Git at all.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Workaround — And Why It Was a Problem
&lt;/h2&gt;

&lt;p&gt;Around the same time I was struggling with Git Bash, I had just started learning Python and had installed Anaconda. I was using &lt;strong&gt;Anaconda Prompt&lt;/strong&gt; every day for my Python work, so it was already open on my machine.&lt;/p&gt;

&lt;p&gt;Since Git Bash was not cooperating, I took what felt like the easiest path at the time. I started uploading my work directly to GitHub.&lt;/p&gt;

&lt;p&gt;No commits. No Git commands. No terminal at all. Just dragging files onto the GitHub website and clicking upload.&lt;/p&gt;

&lt;p&gt;At the time, it seemed fine. The code was online, after all. My work was saved. What was the problem?&lt;/p&gt;

&lt;p&gt;The problem was that this is not how developers actually work. And the more tutorials I watched, the more obvious that became.&lt;/p&gt;

&lt;p&gt;Everyone was using commands like git add, git commit, and git push. They had a history of changes. They could go back to previous versions. They could work in teams without overwriting each other's work.&lt;/p&gt;

&lt;p&gt;I was just throwing files at a website and hoping for the best.&lt;/p&gt;

&lt;p&gt;Direct uploading skips the entire point of Git. You get no version history, no commit messages, no record of what changed or why. &lt;/p&gt;




&lt;h2&gt;
  
  
  The Realization That Changed Everything
&lt;/h2&gt;

&lt;p&gt;Since Git Bash was broken, my first question was a practical one: is Git even installed properly on this machine? I needed to check. So, I opened CMD first, the basic Windows terminal that has always been there and typed:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;git --version&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It worked. Git version 2.52.0. No errors. No red text.&lt;/p&gt;

&lt;p&gt;Then I tried the same thing in Anaconda Prompt, which I already had open for my Python work:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;git --version&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That one moment changed how I understood everything.&lt;br&gt;
Git commands do not belong to Git Bash. They belong to Git itself. Git Bash is just one of many places where those commands can run.&lt;/p&gt;

&lt;p&gt;Once Git is installed on your computer, any terminal can use it. Not just Git Bash. CMD works. PowerShell works. Anaconda Prompt works. The terminal inside Visual Studio Code works.&lt;/p&gt;

&lt;p&gt;The terminal is just the interface. Git is the actual tool doing the work.&lt;/p&gt;

&lt;p&gt;That small realization made everything less intimidating. I stopped worrying about which terminal to use and started focusing on learning Git itself.&lt;/p&gt;




&lt;h1&gt;
  
  
  Understanding the Terms
&lt;/h1&gt;

&lt;h3&gt;
  
  
  1. What is Git?
&lt;/h3&gt;

&lt;p&gt;Git is a version control system. It keeps a record of every change you make to your files so you can go back to any previous version at any time.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
You are working on an assignment, everything is going well. Then you try something new and suddenly everything breaks. You wish you could go back to the version that was working. But you cannot, because you saved over it.&lt;/p&gt;

&lt;p&gt;We've all done something like this:&lt;/p&gt;

&lt;p&gt;myassignment.py&lt;br&gt;
myassignment_v2.py&lt;br&gt;
myassignment_FINAL.py&lt;br&gt;
myassignment_FINAL_v2.py&lt;br&gt;
myassignment_FINAL_ACTUALLY_FINAL.py&lt;/p&gt;

&lt;p&gt;Git solves this problem elegantly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Benefits of Git&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benefit&lt;/th&gt;
&lt;th&gt;What It Means For You&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Version History&lt;/td&gt;
&lt;td&gt;Every change is recorded. Go back anytime.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Safety Net&lt;/td&gt;
&lt;td&gt;Restore the working version instantly.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Teamwork&lt;/td&gt;
&lt;td&gt;Multiple people can work on the same project seamlessly.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accountability&lt;/td&gt;
&lt;td&gt;See what changed, when, and who did it.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Professional Standard&lt;/td&gt;
&lt;td&gt;Every tech company in the world uses Git.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2. What is GitHub?
&lt;/h3&gt;

&lt;p&gt;This was my biggest confusion for the longest time. I kept hearing Git and GitHub used together and assumed they were the same thing.&lt;/p&gt;

&lt;p&gt;They are not.&lt;/p&gt;

&lt;p&gt;Git tracks changes on your computer. GitHub stores your code online. Git was created in 2005. GitHub was built later in 2008 as a platform for hosting Git repositories.&lt;/p&gt;

&lt;p&gt;When I uploaded files directly to GitHub without Git, I had storage without version control—like backing up a Word document without using track changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Git vs GitHub&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Git&lt;/th&gt;
&lt;th&gt;GitHub&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;What it is&lt;/td&gt;
&lt;td&gt;Software on your computer&lt;/td&gt;
&lt;td&gt;A website on the internet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Where it lives&lt;/td&gt;
&lt;td&gt;Your local machine&lt;/td&gt;
&lt;td&gt;Online cloud storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Main purpose&lt;/td&gt;
&lt;td&gt;Track code changes&lt;/td&gt;
&lt;td&gt;Store and share code&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Needs internet&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Free to use&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes (with limits)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Why GitHub Matters&lt;/strong&gt;&lt;br&gt;
Your GitHub profile is your professional portfolio. Employers often look at GitHub before your CV. Every commit tells a story of your progress. Treat it like your digital CV from day one. It shows them real work, real code, and real progress over time.&lt;/p&gt;

&lt;p&gt;Every assignment you push is proof of your skills. Every commit tells a story of how you improved. Start treating your GitHub like your digital CV from day one.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. What is Git Bash?
&lt;/h3&gt;

&lt;p&gt;Git Bash is simply a terminal, a window where you type commands, that comes bundled with Git when you install it on Windows.&lt;/p&gt;

&lt;p&gt;Windows already has terminals like CMD and PowerShell. Git Bash exists because Windows doesn’t natively support Linux-style commands, which are standard in most professional environments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Makes Git Bash Useful&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Git Bash&lt;/th&gt;
&lt;th&gt;Regular CMD&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Shows current branch name&lt;/td&gt;
&lt;td&gt;Yes — in the prompt itself&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Coloured output&lt;/td&gt;
&lt;td&gt;Yes — green, red, yellow&lt;/td&gt;
&lt;td&gt;Plain white text only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Linux commands (ls, rm, touch)&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;No — uses dir, del instead&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tab auto-complete&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Optimised for Git&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Works but basic&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Getting Git Bash&lt;/strong&gt;&lt;br&gt;
Git Bash is not a separate download. It comes automatically when you install Git:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to git-scm.com&lt;/li&gt;
&lt;li&gt;Download Git for Windows&lt;/li&gt;
&lt;li&gt;Install using the default settings&lt;/li&gt;
&lt;li&gt;Git Bash appears in your Start Menu automatically&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Part Nobody Told Me:&lt;/strong&gt; Any Terminal Works!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This is the core insight of this entire article. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once Git is installed on your computer, you can run Git commands from any terminal. Not just Git Bash.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;git init&lt;br&gt;
git add .&lt;br&gt;
git commit -m "first commit"&lt;br&gt;
git push&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;These commands work in Git Bash. They work in CMD. They work in PowerShell. They work in Anaconda Prompt. They work in the terminal inside Visual Studio Code.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;strong&gt;Using Anaconda Prompt for Git&lt;/strong&gt;&lt;br&gt;
This is the option that saved me when Git Bash was broken, and it is what I genuinely recommend for data science beginners like me.&lt;/p&gt;

&lt;p&gt;If you already have Anaconda installed for Python and Jupyter, you already have a terminal that works perfectly for Git. You do not need to install or fix anything extra.&lt;/p&gt;

&lt;p&gt;You can manage Python, run Jupyter notebooks, and push to GitHub all from the same Anaconda Prompt window. One tool. Less confusion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Terminal Comparison&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Terminal&lt;/th&gt;
&lt;th&gt;Git Support&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Recommended?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Git Bash&lt;/td&gt;
&lt;td&gt;Full + extra features&lt;/td&gt;
&lt;td&gt;Git-focused work&lt;/td&gt;
&lt;td&gt;Yes, once working correctly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anaconda Prompt&lt;/td&gt;
&lt;td&gt;Full support&lt;/td&gt;
&lt;td&gt;Data science + Git&lt;/td&gt;
&lt;td&gt;Yes — great for beginners&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CMD&lt;/td&gt;
&lt;td&gt;Full support&lt;/td&gt;
&lt;td&gt;Basic Git operations&lt;/td&gt;
&lt;td&gt;Yes — works fine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PowerShell&lt;/td&gt;
&lt;td&gt;Full support&lt;/td&gt;
&lt;td&gt;Advanced Windows tasks&lt;/td&gt;
&lt;td&gt;Yes — works fine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VS Code Terminal&lt;/td&gt;
&lt;td&gt;Full support&lt;/td&gt;
&lt;td&gt;All-in-one development&lt;/td&gt;
&lt;td&gt;Yes — very powerful&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  The Git Commands You Actually Need
&lt;/h3&gt;

&lt;p&gt;Git has many commands, but you only need around ten for everyday work. Here they are in plain English.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First-Time Setup — Do This Once Per Computer&lt;/strong&gt;&lt;br&gt;
Before using Git, tell it who you are:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;git config --global user.name "Your Name"&lt;br&gt;
git config --global user.email "your@email.com"&lt;br&gt;
git config --list&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Press Enter with each command even if you see nothing showing. This means it is working&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The last command confirms it worked. You should see your name and email displayed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Essential Commands&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;th&gt;Real Life Equivalent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;git init&lt;/td&gt;
&lt;td&gt;Starts Git tracking in a folder&lt;/td&gt;
&lt;td&gt;Hiring a security camera for your folder&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;git status&lt;/td&gt;
&lt;td&gt;Shows what changed&lt;/td&gt;
&lt;td&gt;Checking your to-do list&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;git add .&lt;/td&gt;
&lt;td&gt;Prepares all files for saving&lt;/td&gt;
&lt;td&gt;Putting everything in an envelope&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;git commit -m "msg"&lt;/td&gt;
&lt;td&gt;Saves a version with a label&lt;/td&gt;
&lt;td&gt;Sealing the envelope and labelling it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;git push&lt;/td&gt;
&lt;td&gt;Sends code to GitHub&lt;/td&gt;
&lt;td&gt;Handing the envelope to a courier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;git pull&lt;/td&gt;
&lt;td&gt;Gets latest code from GitHub&lt;/td&gt;
&lt;td&gt;Receiving a delivery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;git clone&lt;/td&gt;
&lt;td&gt;Downloads a full project&lt;/td&gt;
&lt;td&gt;Borrowing a book from a library&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;git log&lt;/td&gt;
&lt;td&gt;Shows history of all commits&lt;/td&gt;
&lt;td&gt;Reading your diary from the beginning&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The Complete Step-by-Step Workflow: First-Time Push to GitHub&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Open your terminal (Git Bash, Anaconda Prompt, VS Code Terminal, etc.) and navigate to your project folder:&lt;/p&gt;

&lt;p&gt;**Step 1 — Navigate to your project folder&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cd "C:\Users\YourName\Documents\your_project"&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The cd command means Change Directory. If your folder name has spaces, wrap the path in quotes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2 — Start Git tracking&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;git init&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You will see: &lt;code&gt;Initialized empty Git repository&lt;/code&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This creates a local Git repository in your folder, allowing Git to track changes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Git is now watching/tracking this folder.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3 — Add your files&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;git add .&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The dot means &lt;em&gt;add everything&lt;/em&gt;. &lt;br&gt;
To add one specific file, replace the dot with the filename.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4 — Commit your changes&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;git commit -m "first trial"&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the actual save. The message in quotes describes what you did. Make it meaningful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5 — Create a repository on GitHub&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note:  Do not initialize it with a README — this avoids merge conflicts.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Go to github.com and log in&lt;/li&gt;
&lt;li&gt;Click the + button in the top right corner&lt;/li&gt;
&lt;li&gt;Click New Repository&lt;/li&gt;
&lt;li&gt;Give it a name with no spaces — use hyphens instead&lt;/li&gt;
&lt;li&gt;Click Create Repository&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 6 — Connect your computer to GitHub.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;git branch -M main&lt;/code&gt;&lt;br&gt;
Click Enter&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Many tutorials and GitHub now use main as the default branch name.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;code&gt;git remote add origin https://USERNAME:TOKEN@github.com/USERNAME/reponame.git&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Click enter&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Replace:&lt;br&gt;
USERNAME → your GitHub username&lt;br&gt;
TOKEN → your personal access token&lt;br&gt;
reponame → your repository name&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For example:&lt;br&gt;
git remote add origin &lt;a href="https://Isika01:hgwghpskskjAKJYVE31t7Ia@github.com/Isika-01/Revision_Assignment.git" rel="noopener noreferrer"&gt;https://Isika01:hgwghpskskjAKJYVE31t7Ia@github.com/Isika-01/Revision_Assignment.git&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;_Note: _GitHub no longer allows passwords for authentication. You must use a personal access token.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Step 7 — Push your code to GitHub&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;git push -u origin main&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;If it works, you will see objects being written and a success message. &lt;/p&gt;

&lt;p&gt;Then check GitHub — your files will be there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every Update After That — Just Three Commands&lt;/strong&gt;&lt;br&gt;
After the first-time setup, every future update is only three commands. That is all. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;git add .&lt;br&gt;
git commit -m "describe what you changed"&lt;br&gt;
git push&lt;/code&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;GitHub Personal Access Tokens — The Part Nobody Explains Well&lt;/strong&gt;&lt;br&gt;
I want to be upfront about this one too. When I first tried to push to GitHub and it asked for a password, I typed my regular GitHub password and got an error. Then I spent an hour confused before discovering that GitHub had changed how authentication works.&lt;/p&gt;

&lt;p&gt;GitHub no longer accepts your regular account password for Git operations. Instead, you need a Personal Access Token. Think of it as a special password created specifically for Git.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creating Your Token&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to github.com and log in&lt;/li&gt;
&lt;li&gt;Click your profile picture in the top right corner&lt;/li&gt;
&lt;li&gt;Click Settings&lt;/li&gt;
&lt;li&gt;Scroll all the way down and click Developer Settings&lt;/li&gt;
&lt;li&gt;Click Personal Access Tokens then Tokens classic&lt;/li&gt;
&lt;li&gt;Click Generate new token classic&lt;/li&gt;
&lt;li&gt;Set the expiration to No expiration&lt;/li&gt;
&lt;li&gt;Check the repo, workflow and read:org boxes&lt;/li&gt;
&lt;li&gt;Click Generate Token&lt;/li&gt;
&lt;li&gt;COPY IT IMMEDIATELY — GitHub will never show it again&lt;/li&gt;
&lt;li&gt;Save it in a Notepad file somewhere private on your computer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Token Safety Rules&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Save token privately in Notepad. Do not share it with anyone &lt;/li&gt;
&lt;li&gt;It acts as your Git password so do not post it online&lt;/li&gt;
&lt;li&gt;If lost, delete and regenerate it&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Common Git Errors and What They Actually Mean&lt;/strong&gt;&lt;br&gt;
Errors in Git are normal. Even experienced developers get them regularly. Here are the ones you are most likely to see:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;What It Means&lt;/th&gt;
&lt;th&gt;How to Fix It&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;remote origin already exists&lt;/td&gt;
&lt;td&gt;You already connected to GitHub&lt;/td&gt;
&lt;td&gt;Run &lt;code&gt;git remote remove origin&lt;/code&gt; then add it again&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;fatal: not a git repository&lt;/td&gt;
&lt;td&gt;Git is not tracking this folder&lt;/td&gt;
&lt;td&gt;Run &lt;code&gt;git init&lt;/code&gt; first&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Authentication failed&lt;/td&gt;
&lt;td&gt;Wrong username or token&lt;/td&gt;
&lt;td&gt;Check your token is correct&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Could not read from remote&lt;/td&gt;
&lt;td&gt;Using SSH format instead of HTTPS&lt;/td&gt;
&lt;td&gt;Use the &lt;code&gt;https://&lt;/code&gt; URL format&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nothing to commit&lt;/td&gt;
&lt;td&gt;No files have been changed&lt;/td&gt;
&lt;td&gt;Edit your files first then add and commit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Best Practices for Professional Git Use&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Write Meaningful Commit Messages&lt;/li&gt;
&lt;li&gt;Your commit messages are a permanent record of your work. Future employers will read them. Make them count.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Good Messages&lt;/th&gt;
&lt;th&gt;Bad Messages&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Completed SQL assignment with JOIN queries&lt;/td&gt;
&lt;td&gt;stuff&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fixed error in data cleaning function&lt;/td&gt;
&lt;td&gt;changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Added visualisation charts for sales report&lt;/td&gt;
&lt;td&gt;update&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Updated README with setup instructions&lt;/td&gt;
&lt;td&gt;aaa&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Resolved login bug reported in code review&lt;/td&gt;
&lt;td&gt;final&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Repository Naming Rules&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use hyphens or underscores instead of spaces&lt;/li&gt;
&lt;li&gt;Be descriptive so you remember what the project is&lt;/li&gt;
&lt;li&gt;Use lowercase letters for consistency&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Good Repository Names&lt;/th&gt;
&lt;th&gt;Bad Repository Names&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;sql-assignment-week1&lt;/td&gt;
&lt;td&gt;Python Data Analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;powerbi-sales-dashboard&lt;/td&gt;
&lt;td&gt;(See above: spaces make it bad)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Keep Your GitHub Active&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Your GitHub profile shows your commit activity over time. Even small daily commits build a strong profile. Treat every assignment as an opportunity to add to your portfolio.&lt;br&gt;
Employers look at how consistently you push code.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Sometimes the broken path teaches you more than the smooth one.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>git</category>
      <category>githubactions</category>
      <category>versioncontrol</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Mastering SQL Joins and Window Functions with Real Examples</title>
      <dc:creator>Isika Millicent</dc:creator>
      <pubDate>Mon, 09 Mar 2026 11:11:43 +0000</pubDate>
      <link>https://dev.to/isika_millicent/mastering-sql-joins-and-window-functions-with-real-examples-5a6n</link>
      <guid>https://dev.to/isika_millicent/mastering-sql-joins-and-window-functions-with-real-examples-5a6n</guid>
      <description>&lt;p&gt;If you have been writing SQL for a while, you have probably come across &lt;strong&gt;joins&lt;/strong&gt; and &lt;strong&gt;window functions&lt;/strong&gt;. These two features appear in a large percentage of real-world SQL queries, and for good reason. They are among the most powerful tools SQL has to offer.&lt;/p&gt;

&lt;p&gt;However, they can be confusing at first. Understanding how tables connect with joins and how window functions analyze data across rows is a key step in becoming comfortable with SQL.&lt;/p&gt;

&lt;p&gt;In this article, we will walk through the most common types of joins using a simple example and explain how they work.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. What are Joins?
&lt;/h2&gt;

&lt;p&gt;In SQL, joins are used to combine rows from two or more tables based on a related column between them.&lt;br&gt;
Think of it as sliding two spreadsheets together so that matching values line up, the join type determines what you do with rows that have no partner on the other side.&lt;/p&gt;

&lt;p&gt;To understand joins better, we will use the following two tables. &lt;/p&gt;

&lt;p&gt;In the Customers table, &lt;code&gt;customer_id&lt;/code&gt; is the &lt;strong&gt;primary key&lt;/strong&gt;.&lt;br&gt;
In the Orders table, &lt;code&gt;order_id&lt;/code&gt; is the primary key, while &lt;code&gt;customer_id&lt;/code&gt; acts as a &lt;strong&gt;foreign key&lt;/strong&gt; that references the Customers table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table 1: Customers Table&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;customer_name&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="c1"&gt;------------|---------------|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;          &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Alice&lt;/span&gt;         &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;          &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Bob&lt;/span&gt;           &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;          &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Charlie&lt;/span&gt;       &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;          &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Diana&lt;/span&gt;         &lt;span class="o"&gt;|&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Table 2: Orders Table&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;total_amount&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="c1"&gt;----------|------------|--------------|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;101&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;          &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;250&lt;/span&gt;          &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;102&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;          &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;150&lt;/span&gt;          &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;103&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;          &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;          &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;104&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;          &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;          &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;105&lt;/span&gt;      &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;          &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;          &lt;span class="o"&gt;|&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;NB:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer 4 (Diana) has no orders&lt;/li&gt;
&lt;li&gt;Order 105 belongs to customer_id 5, which does not exist in Customers.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  A) INNER Join
&lt;/h3&gt;

&lt;p&gt;Returns rows where there is a match in both tables. &lt;br&gt;
If there’s no match, the result set will not include those records.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT c.customer_name, o.order_id, o.total_amount&lt;br&gt;
FROM Customers c&lt;br&gt;
INNER JOIN Orders o&lt;br&gt;
ON c.customer_id = o.customer_id;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Result&lt;/em&gt;: The above query returns only customers who actually have orders.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;   &lt;span class="n"&gt;order_id&lt;/span&gt;    &lt;span class="n"&gt;total_amount&lt;/span&gt;
&lt;span class="n"&gt;Alice&lt;/span&gt;             &lt;span class="mi"&gt;101&lt;/span&gt;          &lt;span class="mi"&gt;250&lt;/span&gt;
&lt;span class="n"&gt;Alice&lt;/span&gt;             &lt;span class="mi"&gt;103&lt;/span&gt;          &lt;span class="mi"&gt;300&lt;/span&gt;
&lt;span class="n"&gt;Bob&lt;/span&gt;               &lt;span class="mi"&gt;102&lt;/span&gt;          &lt;span class="mi"&gt;150&lt;/span&gt;
&lt;span class="n"&gt;Charlie&lt;/span&gt;           &lt;span class="mi"&gt;104&lt;/span&gt;          &lt;span class="mi"&gt;200&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;NB&lt;/strong&gt;&lt;br&gt;
Notice that Diana (no orders) and Order 105 (no matching customer) are both absent. Only rows with a valid match on both sides make it through.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkpdu862ctuw1d6y9omuu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkpdu862ctuw1d6y9omuu.png" alt="Inner Join" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This is the most common join. &lt;br&gt;
It is mostly when you want records that exist in both tables eg orders with valid customers, students with enrolled courses.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  B. LEFT(OUTER) Join
&lt;/h3&gt;

&lt;p&gt;This join returns all records from the left table and the matched records from the right table.&lt;br&gt;
If there’s no match, NULL values are returned for the right table’s columns.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In this case, the left table is the customers table and on the right is the orders table. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;code&gt;SELECT c.customer_name, o.order_id, o.total_amount&lt;br&gt;
FROM Customers c&lt;br&gt;
LEFT JOIN Orders o&lt;br&gt;
ON c.customer_id = o.customer_id;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Result&lt;/em&gt;: The above query returns all customers even those with no orders.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;   &lt;span class="n"&gt;order_id&lt;/span&gt;   &lt;span class="n"&gt;total_amount&lt;/span&gt;
&lt;span class="n"&gt;Alice&lt;/span&gt;              &lt;span class="mi"&gt;101&lt;/span&gt;        &lt;span class="mi"&gt;250&lt;/span&gt;
&lt;span class="n"&gt;Alice&lt;/span&gt;              &lt;span class="mi"&gt;103&lt;/span&gt;        &lt;span class="mi"&gt;300&lt;/span&gt;
&lt;span class="n"&gt;Bob&lt;/span&gt;                &lt;span class="mi"&gt;102&lt;/span&gt;        &lt;span class="mi"&gt;150&lt;/span&gt;
&lt;span class="n"&gt;Charlie&lt;/span&gt;            &lt;span class="mi"&gt;104&lt;/span&gt;        &lt;span class="mi"&gt;200&lt;/span&gt;
&lt;span class="n"&gt;Diana&lt;/span&gt;              &lt;span class="k"&gt;NULL&lt;/span&gt;       &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;NB&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Diana now appears in the results, even though she has no orders — her order columns simply return NULL.&lt;/li&gt;
&lt;li&gt;Left Join is the foundation of the anti-join pattern — filtering to WHERE o.order_id IS NULL gives you only the customers who have never placed an order.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;B.i) Anti Join&lt;/strong&gt;&lt;br&gt;
An Anti Join is used when you want to find records in one table that do not have a corresponding match in another table.&lt;/p&gt;

&lt;p&gt;For Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Find customers who haven’t purchased anything&lt;/li&gt;
&lt;li&gt;Identify products with no sales&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using the above table, LEFT JOIN + WHERE NULL&lt;br&gt;
&lt;code&gt;SELECT c.customer_id, c.customer_name&lt;br&gt;
FROM Customers c&lt;br&gt;
LEFT JOIN Orders o&lt;br&gt;
ON c.customer_id = o.customer_id&lt;br&gt;
WHERE o.order_id IS NULL;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Result: For customers without matching orders, &lt;code&gt;order_id&lt;/code&gt;returns &lt;code&gt;NULL&lt;/code&gt;. &lt;br&gt;
The WHERE o.order_id IS NULL condition filters these unmatched rows.&lt;br&gt;
This is the anti join pattern.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;customer_id    customer_name
4                 Diana
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  C) RIGHT Join
&lt;/h3&gt;

&lt;p&gt;Returns all rows from the right table and matching rows from the left table.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT c.customer_name, o.order_id, o.total_amount&lt;br&gt;
FROM Customers c&lt;br&gt;
RIGHT JOIN Orders o&lt;br&gt;
ON c.customer_id = o.customer_id;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Result&lt;/em&gt;: The above query returns all orders, even those without matching customers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;   &lt;span class="n"&gt;order_id&lt;/span&gt;   &lt;span class="n"&gt;total_amount&lt;/span&gt;
&lt;span class="n"&gt;Alice&lt;/span&gt;             &lt;span class="mi"&gt;101&lt;/span&gt;         &lt;span class="mi"&gt;250&lt;/span&gt;
&lt;span class="n"&gt;Alice&lt;/span&gt;             &lt;span class="mi"&gt;103&lt;/span&gt;         &lt;span class="mi"&gt;300&lt;/span&gt;
&lt;span class="n"&gt;Bob&lt;/span&gt;               &lt;span class="mi"&gt;102&lt;/span&gt;         &lt;span class="mi"&gt;150&lt;/span&gt;
&lt;span class="n"&gt;Charlie&lt;/span&gt;           &lt;span class="mi"&gt;104&lt;/span&gt;         &lt;span class="mi"&gt;200&lt;/span&gt;
&lt;span class="k"&gt;NULL&lt;/span&gt;              &lt;span class="mi"&gt;105&lt;/span&gt;         &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  d) FULL OUTER Join
&lt;/h3&gt;

&lt;p&gt;Returns all rows when there is a match in one of the tables.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT c.customer_name, o.order_id, o.total_amount&lt;br&gt;
FROM Customers c&lt;br&gt;
FULL OUTER JOIN Orders o&lt;br&gt;
ON c.customer_id = o.customer_id;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Result&lt;/em&gt;: Includes all customers and all orders, matching where possible. Combine both LEFT + RIGHT and everything shown&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;   &lt;span class="n"&gt;order_id&lt;/span&gt;    &lt;span class="n"&gt;total_amount&lt;/span&gt;
&lt;span class="n"&gt;Alice&lt;/span&gt;             &lt;span class="mi"&gt;101&lt;/span&gt;           &lt;span class="mi"&gt;250&lt;/span&gt;
&lt;span class="n"&gt;Alice&lt;/span&gt;             &lt;span class="mi"&gt;103&lt;/span&gt;           &lt;span class="mi"&gt;300&lt;/span&gt;
&lt;span class="n"&gt;Bob&lt;/span&gt;               &lt;span class="mi"&gt;102&lt;/span&gt;           &lt;span class="mi"&gt;150&lt;/span&gt;
&lt;span class="n"&gt;Charlie&lt;/span&gt;           &lt;span class="mi"&gt;104&lt;/span&gt;           &lt;span class="mi"&gt;200&lt;/span&gt;
&lt;span class="n"&gt;Diana&lt;/span&gt;             &lt;span class="k"&gt;NULL&lt;/span&gt;          &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="k"&gt;NULL&lt;/span&gt;              &lt;span class="mi"&gt;105&lt;/span&gt;          &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  e) SELF Join
&lt;/h3&gt;

&lt;p&gt;A Self Join is when a table is joined with itself.&lt;/p&gt;

&lt;p&gt;Instead of joining two different tables, you treat the same table as if it were two separate tables by using table aliases.&lt;/p&gt;

&lt;p&gt;Self joins are commonly used when a table contains hierarchical or related data within itself, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;employees and managers&lt;/li&gt;
&lt;li&gt;categories and subcategories&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example Table: Employees&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;employee_id&lt;/span&gt;     &lt;span class="n"&gt;employee_name&lt;/span&gt;    &lt;span class="n"&gt;manager_id&lt;/span&gt;
&lt;span class="mi"&gt;1&lt;/span&gt;                  &lt;span class="n"&gt;Alice&lt;/span&gt;           &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="mi"&gt;2&lt;/span&gt;                  &lt;span class="n"&gt;Bob&lt;/span&gt;             &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="mi"&gt;3&lt;/span&gt;                  &lt;span class="n"&gt;Charlie&lt;/span&gt;         &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="mi"&gt;4&lt;/span&gt;                  &lt;span class="n"&gt;Diana&lt;/span&gt;           &lt;span class="mi"&gt;2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;NB&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Alice is the top manager (no manager).&lt;/li&gt;
&lt;li&gt;Bob and Charlie report to Alice.&lt;/li&gt;
&lt;li&gt;Diana reports to Bob.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;SELECT &lt;br&gt;
e.employee_name AS employee,&lt;br&gt;
m.employee_name AS manager&lt;br&gt;
FROM employees e&lt;br&gt;
LEFT JOIN employees m&lt;br&gt;
ON e.manager_id = m.employee_id;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Result&lt;/em&gt;: The above query essentially asks: "For each employee, find the person whose employee_id matches their manager_id."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;employee&lt;/span&gt;     &lt;span class="n"&gt;manager&lt;/span&gt;
&lt;span class="n"&gt;Alice&lt;/span&gt;         &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="n"&gt;Bob&lt;/span&gt;           &lt;span class="n"&gt;Alice&lt;/span&gt;
&lt;span class="n"&gt;Charlie&lt;/span&gt;       &lt;span class="n"&gt;Alice&lt;/span&gt;
&lt;span class="n"&gt;Diana&lt;/span&gt;         &lt;span class="n"&gt;Bob&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let me explain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The employees table is referenced twice in the query.&lt;/li&gt;
&lt;li&gt;e represents the employee.&lt;/li&gt;
&lt;li&gt;m represents the manager.&lt;/li&gt;
&lt;li&gt;We match manager_id from one instance of the table to employee_id in the other.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  2. Window Functions
&lt;/h2&gt;

&lt;p&gt;While joins help you combine data across tables, window functions help you analyze data across rows — without collapsing those rows into a single result the way &lt;code&gt;GROUP BY&lt;/code&gt; does.&lt;/p&gt;

&lt;p&gt;This is the key distinction: &lt;code&gt;GROUP BY&lt;/code&gt; aggregates rows and destroys their individual identity. A window function performs the same kind of calculation but preserves every row, attaching the result as a new column alongside the original data.&lt;/p&gt;

&lt;p&gt;The syntax uses &lt;strong&gt;OVER()&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The OVER() clause is what transforms an ordinary aggregate into a window function. An empty OVER() applies the function across the entire result set. &lt;br&gt;
PARTITION BY subdivides that set into groups, restarting the calculation for each one.&lt;/p&gt;
&lt;h3&gt;
  
  
  a) Ranking Functions (ROW_NUMBER, RANK, DENSE_RANK)
&lt;/h3&gt;

&lt;p&gt;Suppose you want to rank each order by its &lt;code&gt;total_amount&lt;/code&gt; — highest to lowest. &lt;br&gt;
Three ranking functions can do this, and their difference only shows up when values tie.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;-- Always unique; no ties are possible (1, 2, 3, 4 …)&lt;/span&gt;
&lt;span class="n"&gt;ROW_NUMBER&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;row_num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;-- Tied rows share a rank; the next rank skips (1, 1, 3 …)&lt;/span&gt;
&lt;span class="n"&gt;RANK&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;rnk&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;-- Tied rows share a rank; no ranks are skipped (1, 1, 2 …)&lt;/span&gt;
&lt;span class="n"&gt;DENSE_RANK&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dense_rnk&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;Customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;order_id&lt;/th&gt;
&lt;th&gt;customer_name&lt;/th&gt;
&lt;th&gt;total_amount&lt;/th&gt;
&lt;th&gt;row_num&lt;/th&gt;
&lt;th&gt;rnk&lt;/th&gt;
&lt;th&gt;dense_rnk&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;103&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;300&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;250&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;104&lt;/td&gt;
&lt;td&gt;Charlie&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;Bob&lt;/td&gt;
&lt;td&gt;150&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;NB:&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Order 105 does not appear because it has no matching customer — the INNER JOIN filters it out. &lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  b) Aggregates Over a Window — SUM, AVG, COUNT
&lt;/h3&gt;

&lt;p&gt;One of the most powerful things window functions enable is comparing each row against an overall or group-level aggregate — without &lt;code&gt;GROUP BY&lt;/code&gt; collapsing your data.&lt;/p&gt;

&lt;p&gt;Here, we calculate the total revenue across all orders and each order's percentage share of that total:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;grand_total&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;Customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;order_id&lt;/th&gt;
&lt;th&gt;customer_name&lt;/th&gt;
&lt;th&gt;total_amount&lt;/th&gt;
&lt;th&gt;grand_total&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;250&lt;/td&gt;
&lt;td&gt;900&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;Bob&lt;/td&gt;
&lt;td&gt;150&lt;/td&gt;
&lt;td&gt;900&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;103&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;300&lt;/td&gt;
&lt;td&gt;900&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;104&lt;/td&gt;
&lt;td&gt;Charlie&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;td&gt;900&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;NB&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Notice that &lt;code&gt;grand_total&lt;/code&gt; is the same on every row.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;OVER()&lt;/code&gt; means "look across the entire result set." &lt;/li&gt;
&lt;li&gt;Each row keeps its own identity while also having access to the overall total.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  c) LAG and LEAD — Comparing Orders Across Rows
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;LAG&lt;/code&gt; looks at the &lt;em&gt;previous&lt;/em&gt; row's value and &lt;code&gt;LEAD&lt;/code&gt; looks at the &lt;em&gt;next&lt;/em&gt; row's value. &lt;br&gt;
Here, we can use &lt;code&gt;LAG&lt;/code&gt; to show how much each of Alice's orders changed compared to her previous one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;-- We are now looking for the previous order's amount for the same customer&lt;/span&gt;
    &lt;span class="c1"&gt;-- o.total_amount is the column we want to “look back” at.&lt;/span&gt;
    &lt;span class="c1"&gt;-- the` 1 `returns how many rows back we want to look (1 row back in this case)&lt;/span&gt;
&lt;span class="n"&gt;LAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
        &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;prev_order_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;-- Difference from the previous order using minus sign&lt;/span&gt;
 &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;LAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
       &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
       &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;change_from_prev&lt;/span&gt;

&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;Customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;order_id&lt;/th&gt;
&lt;th&gt;customer_name&lt;/th&gt;
&lt;th&gt;total_amount&lt;/th&gt;
&lt;th&gt;prev_order_amount&lt;/th&gt;
&lt;th&gt;change_from_prev&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;250&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;103&lt;/td&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;300&lt;/td&gt;
&lt;td&gt;250&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;102&lt;/td&gt;
&lt;td&gt;Bob&lt;/td&gt;
&lt;td&gt;150&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;104&lt;/td&gt;
&lt;td&gt;Charlie&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;td&gt;NULL&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;NB&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Alice's second order (103) was 50 more than her first (101). Bob and - Charlie each only have one order, so their &lt;code&gt;prev_order_amount&lt;/code&gt; returns &lt;code&gt;NULL&lt;/code&gt; — there is no previous row to look back at.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  d) NTILE — Bucketing Customers by Spend
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;NTILE(n)&lt;/code&gt; divides rows into &lt;em&gt;n&lt;/em&gt; roughly equal buckets and assigns each row a bucket number. It is perfect for segmenting customers by how much they have spent.&lt;/p&gt;

&lt;p&gt;Since Diana has no orders, we first join and aggregate, then apply &lt;code&gt;NTILE&lt;/code&gt; to rank the customers who do have orders:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;                               &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_spent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="n"&gt;NTILE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;spend_tier&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="k"&gt;CASE&lt;/span&gt; &lt;span class="n"&gt;NTILE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;OVER&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'High Spender'&lt;/span&gt;
&lt;span class="k"&gt;WHEN&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="k"&gt;THEN&lt;/span&gt; &lt;span class="s1"&gt;'Mid Spender'&lt;/span&gt;
&lt;span class="k"&gt;ELSE&lt;/span&gt; &lt;span class="s1"&gt;'Low Spender'&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;segment&lt;/span&gt;

&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;
&lt;span class="k"&gt;INNER&lt;/span&gt; &lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;Customers&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_name&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;total_spent&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;customer_name&lt;/th&gt;
&lt;th&gt;total_spent&lt;/th&gt;
&lt;th&gt;spend_tier&lt;/th&gt;
&lt;th&gt;segment&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Alice&lt;/td&gt;
&lt;td&gt;550&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;High Spender&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Charlie&lt;/td&gt;
&lt;td&gt;200&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Mid Spender&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bob&lt;/td&gt;
&lt;td&gt;150&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Low Spender&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;*&lt;em&gt;NB&lt;/em&gt;&lt;br&gt;
What NTILE(3) does is divide the results into 3 equal groups (tiles).&lt;/p&gt;

&lt;p&gt;Alice, with a combined spend of 550, lands in Tier 1. Bob and Charlie each have a single order and are bucketed accordingly. Diana does not appear because she has no orders in the &lt;code&gt;Orders&lt;/code&gt; table.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In conclusion, as mentioned earlier, these two concepts require a lot of practice to master. The best way to build mastery is to work with a dataset you already have and understand or a spreadsheet you have exported and deliberately write one query using each join type and one query using each of the window functions covered here.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>sql</category>
      <category>tutorial</category>
      <category>learning</category>
    </item>
    <item>
      <title>Connecting Power BI to SQL Databases: A Comprehensive Guide</title>
      <dc:creator>Isika Millicent</dc:creator>
      <pubDate>Sun, 08 Mar 2026 20:17:52 +0000</pubDate>
      <link>https://dev.to/isika_millicent/connecting-power-bi-to-sql-databases-a-comprehensive-guide-1mpf</link>
      <guid>https://dev.to/isika_millicent/connecting-power-bi-to-sql-databases-a-comprehensive-guide-1mpf</guid>
      <description>&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;What Is Power BI? 

&lt;ul&gt;
&lt;li&gt;What Power BI Actually Is&lt;/li&gt;
&lt;li&gt;Why Connect Power BI to a Database — Not Just a Spreadsheet?&lt;/li&gt;
&lt;li&gt;What Is a SQL Database?&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Connecting Power BI to a Local PostgreSQL Database&lt;/li&gt;
&lt;li&gt;Connecting to Aiven — A Cloud PostgreSQL Database&lt;/li&gt;
&lt;li&gt;Loading Tables and Building the Data Model&lt;/li&gt;
&lt;li&gt;Why SQL Skills Are Important for Power BI Analysts&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;Before we dive into all this, take a moment to think about your own data setup:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;How many different sources does your organization use? (Excel, Google Sheets, CSV exports, CRM systems?)&lt;/li&gt;
&lt;li&gt;Do multiple people ever overwrite each other’s work?&lt;/li&gt;
&lt;li&gt;How long does it take to update reports or dashboards?&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;If you answered “more than once” or “too long” to any of these, connecting Power BI to a SQL database could save you hours or even days, every month.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  1. What Is Power BI?
&lt;/h2&gt;

&lt;p&gt;If you’ve ever looked at a large spreadsheet full of rows and columns, you know how difficult it can be to quickly understand what the data is actually telling you. Important insights can easily get buried in raw numbers, making it hard to see patterns, trends, or key business metrics.&lt;/p&gt;

&lt;p&gt;Look at this Excel sheet for example and see how hard it is to interpret the data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2u0mr0y31elckuwvbjii.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2u0mr0y31elckuwvbjii.png" alt="Excel Sheet" width="800" height="250"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This challenge exists whether you’re a business owner, a student learning data analytics, or a professional working with large datasets.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is where Power BI comes in.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Power BI is a business intelligence tool developed by Microsoft that allows users to connect to multiple data sources, transform and analyze data, and build interactive dashboards and reports.&lt;/p&gt;

&lt;p&gt;Using visual elements such as charts, maps, tables, and KPIs, Power BI turns raw data into clear, actionable insights that are easy to understand.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwlurtvo7va34o0h4jzj3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwlurtvo7va34o0h4jzj3.png" alt="PowerBi Vizualization" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While Power BI can connect to many different data sources such as Excel files and cloud services, one of the most common and powerful integrations is with SQL databases.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsynbyysjglgiieuctjc5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsynbyysjglgiieuctjc5.png" alt="PowerBi Data Source" width="600" height="846"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Many organizations store their operational data in relational databases like PostgreSQL, MySQL, or Microsoft SQL Server. These databases contain structured information such as transactions, customer records, product inventories etc.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Role of SQL Databases in Analytics&lt;/strong&gt;&lt;br&gt;
SQL stands for Structured Query Language. &lt;br&gt;
A SQL database organizes data into tables, which look similar to spreadsheets(rows and columns) but with strict rules about what each column can contain and how tables relate to each other.&lt;/p&gt;

&lt;p&gt;You use SQL, the language, to ask questions of the database, like: "Show me all properties in Nairobi that sold for more than 10 million shillings this year."&lt;/p&gt;

&lt;p&gt;For analytical workloads, SQL databases serve as the foundation from which BI tools like Power BI extract, transform, and visualize data. &lt;/p&gt;

&lt;p&gt;Whether it is customer records, product catalogues, sales transactions, or inventory logs, the database acts as the main storage of business data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Connect Power BI to a Database not Just a Spreadsheet?(for example, a file created in Excel)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Many businesses start by storing their data in Excel spreadsheets. For small datasets, this works fine. &lt;/p&gt;

&lt;p&gt;However, as the volume of data grows, spreadsheets quickly become difficult to manage.&lt;/p&gt;

&lt;p&gt;Let us consider a property management company tracking 5,000 rental units across multiple cities. Their data includes tenant records, property details, rental payments, and transaction histories. Managing all of this in a single spreadsheet would be inefficient and prone to errors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A SQL database solves these problems in several ways&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scalability: Databases can store millions of rows of data without slowing down.&lt;/li&gt;
&lt;li&gt;Multi-user access: Multiple users can read and update data simultaneously without overwriting each other’s work.&lt;/li&gt;
&lt;li&gt;Structured data integrity: Data is stored in well-organized tables with defined relationships and rules. For example, every property can be linked to a valid agent ID, preventing incomplete or inconsistent records.&lt;/li&gt;
&lt;li&gt;Automated reporting: Power BI can connect directly to the database and refresh reports automatically, ensuring dashboards always reflect the most recent data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the next section, we’ll walk through how to connect Power BI to a SQL database step by step.&lt;/p&gt;

&lt;p&gt;Throughout this guide, we use &lt;strong&gt;Power BI Desktop&lt;/strong&gt; — that is where all the connecting, modelling, and building happens.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Connecting Power BI to a Local PostgreSQL Database
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;"Local"&lt;/strong&gt; simply means the database is installed on your own computer.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Step 1&lt;/em&gt;: &lt;strong&gt;Open Power BI Desktop&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Launch Power BI Desktop and navigate to the Home tab. &lt;/li&gt;
&lt;li&gt;Click Get Data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Step 2&lt;/em&gt;: &lt;strong&gt;Select PostgreSQL as the Data Source&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In the Get Data window, search for PostgreSQL database and select it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Step 3&lt;/em&gt;: &lt;strong&gt;Enter the Server and Database Details&lt;/strong&gt;&lt;br&gt;
In the connection dialog box, enter the following information:&lt;/p&gt;

&lt;p&gt;Server: &lt;code&gt;localhost&lt;/code&gt;&lt;br&gt;
Database: &lt;code&gt;lux_sales&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;If PostgreSQL is installed locally, the server's name localhost refers to your own computer.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Power BI will then prompt you to select a connection mode:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Import Mode – copies the data into Power BI for faster analysis.&lt;/li&gt;
&lt;li&gt;Direct Query Mode – queries the database in real time without storing the data in Power BI.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Most beginners use Import Mode because it provides faster performance and is easier to work with during analysis.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;Step 4&lt;/em&gt;: &lt;strong&gt;Enter Credentials&lt;/strong&gt;&lt;br&gt;
Next, Power BI will ask for database authentication details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Username&lt;/li&gt;
&lt;li&gt;Password&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are the same credentials used to log into the PostgreSQL database.&lt;br&gt;
After authentication, Power BI establishes the connection to the database.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Step 5&lt;/em&gt;: &lt;strong&gt;Select Tables in the Navigator&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once connected, the Navigator window displays the available tables in the database.&lt;/p&gt;

&lt;p&gt;Users can preview the tables before loading them into Power BI.&lt;/p&gt;

&lt;p&gt;Select the required tables and click Load.&lt;/p&gt;

&lt;p&gt;The tables will now appear in Power BI’s Data View and will be available for analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Connecting Power BI to a Cloud PostgreSQL Database (Aiven)
&lt;/h2&gt;

&lt;p&gt;In many organizations, databases are hosted in the cloud instead of on local machines. One platform that provides managed PostgreSQL databases is Aiven.&lt;/p&gt;

&lt;p&gt;With managed services like this, organizations do not need to worry about maintaining servers, backups, or updates. Instead, they can focus on analyzing their data.&lt;/p&gt;

&lt;p&gt;A typical analytics workflow might involve several tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Aiven - hosting the database in the cloud&lt;/li&gt;
&lt;li&gt;PostgreSQL - storing the structured data&lt;/li&gt;
&lt;li&gt;DBeaver - for exploring and querying the database&lt;/li&gt;
&lt;li&gt;Power BI - for building dashboards and visualizations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frfqtzxft1gxs6ohdwnr2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frfqtzxft1gxs6ohdwnr2.png" alt="Cloud Connection" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Step 1&lt;/em&gt;: &lt;strong&gt;Obtain Database Connection Details from Aiven&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;To connect Power BI to a cloud database, you first need the connection details provided by Aiven.&lt;/li&gt;
&lt;li&gt;Inside the Aiven dashboard, you will find the following parameters:
&lt;code&gt;Host, Port, Database name, Username, Password&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It would look like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fumq2peegp08flbd7hnuq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fumq2peegp08flbd7hnuq.png" alt="Aiven" width="800" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These details are required for any tool connecting to the database, including Power BI or DBeaver.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Step 2&lt;/em&gt;: &lt;strong&gt;Connect Power BI to PostgreSQL&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Click Get Data&lt;/li&gt;
&lt;li&gt;Select PostgreSQL database and you will get this pop up&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2oeji2ddlcdig67r3jfy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2oeji2ddlcdig67r3jfy.png" alt="Pop up" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enter the connection details from Aiven:&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;Host → pg-3b04de5f-millicentisika-2259.j.aivencloud.com&lt;br&gt;
Port → 11547&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pg-3b04de5f-millicentisika-2259.j.aivencloud.com:11547&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The database you should input the database you want to work on.&lt;br&gt;
In this example, the database I wanted to use for analysis was &lt;code&gt;luxsales&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3qoh869l5aabpvhgqs60.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3qoh869l5aabpvhgqs60.png" alt="Diff databases" width="377" height="506"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Choose Import mode for data connectivity.&lt;/li&gt;
&lt;li&gt;Click OK.&lt;/li&gt;
&lt;li&gt;Power BI will then prompt for authentication.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Step 3&lt;/em&gt;: &lt;strong&gt;Download the SSL Certificate&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most cloud databases require secure SSL connections.&lt;/p&gt;

&lt;p&gt;You may encounter this error while trying to connect:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv8oyg3zpp2g5m8s41gjd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv8oyg3zpp2g5m8s41gjd.png" alt="Error Message" width="536" height="250"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This occurs because Aiven enforces secure SSL connections.&lt;/p&gt;

&lt;p&gt;Database tools such as DBeaver automatically trust the certificate, but Power BI requires manual validation.&lt;/p&gt;

&lt;p&gt;To resolve this, download and install the SSL certificate provided by Aiven.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Step 3.a: Download the Certificate from Aiven&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to console.aiven.io&lt;/li&gt;
&lt;li&gt;Open your PostgreSQL service&lt;/li&gt;
&lt;li&gt;Navigate to the Overview tab&lt;/li&gt;
&lt;li&gt;Scroll to Connection Information&lt;/li&gt;
&lt;li&gt;Click Download CA Certificate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This downloads a file named:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ca.pem&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Step 3.b: Convert the Certificate Format&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Windows requires certificates in .crt format.&lt;/p&gt;

&lt;p&gt;Locate the downloaded file (usually in Downloads).&lt;/p&gt;

&lt;p&gt;Rename the file from:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ca.pem&lt;/code&gt; to &lt;code&gt;ca.crt&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;To do this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Right-click the file&lt;/li&gt;
&lt;li&gt;Select Rename&lt;/li&gt;
&lt;li&gt;Change the extension to .crt&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If Windows hides extensions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select Save as type → All Files&lt;/li&gt;
&lt;li&gt;Manually type: ca.crt&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Step 3.c: Install the Certificate in Windows&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Double-click the ca.crt file.&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Install Certificate&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;Local Machine&lt;/strong&gt;, then click Next.&lt;/li&gt;
&lt;li&gt;Choose Place all certificates in the following store.&lt;/li&gt;
&lt;li&gt;Click Browse.&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;Trusted Root Certification Authorities&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Click OK → Next → Finish.&lt;/li&gt;
&lt;li&gt;You should see the confirmation message: &lt;em&gt;The import was successful.&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Step 3.d: Restart Power BI and Reconnect&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After installing the certificate, close Power BI Desktop completely.&lt;/li&gt;
&lt;li&gt;Reopen Power BI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Connect again using the same PostgreSQL connection details.&lt;/p&gt;

&lt;p&gt;Server: &lt;code&gt;pg-3b04de5f-millicentisika-2259.j.aivencloud.com:11547&lt;/code&gt;&lt;br&gt;
Database: &lt;code&gt;luxsales&lt;/code&gt;&lt;br&gt;
Mode: &lt;code&gt;Import&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Once the connection is successful, Power BI will display a Navigator window where you can select tables and load them for analysis.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This works whether the PostgreSQL is on local computer or on cloud service like Aiven. The certificate installation simply allows your computer to trust secure connections from Aiven’s servers. Power BI then connects directly to the cloud PostgreSQL database over the internet.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  4. Loading Tables and Building the Data Model
&lt;/h2&gt;

&lt;p&gt;Once connected to the database, the next step is to load the required tables and establish relationships between them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76bhl9iax8ldsogtcu26.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76bhl9iax8ldsogtcu26.png" alt="Load Data" width="800" height="633"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Select Tables in the Navigator
&lt;/h3&gt;

&lt;p&gt;After connecting to PostgreSQL (local or cloud), the Navigator window displays all available tables in the database.&lt;/p&gt;

&lt;p&gt;For this example, we will load the following tables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;customers&lt;/code&gt; &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;products&lt;/code&gt; &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sales&lt;/code&gt; &lt;/li&gt;
&lt;li&gt;
&lt;code&gt;inventory&lt;/code&gt; &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Select all four tables and click &lt;strong&gt;Load&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Understanding the Data Model
&lt;/h3&gt;

&lt;p&gt;Once loaded, Power BI stores the tables in its data model. To view and manage relationships between tables, navigate to the &lt;strong&gt;Model View&lt;/strong&gt; by clicking the model icon on the left sidebar.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What Is a Data Model&lt;/em&gt;?&lt;br&gt;
A data model defines how tables relate to each other. &lt;br&gt;
In a well-designed model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each table has a &lt;strong&gt;primary key&lt;/strong&gt; (a unique identifier for each row)&lt;/li&gt;
&lt;li&gt;Tables are connected through &lt;strong&gt;foreign keys&lt;/strong&gt; (columns that reference primary keys in other tables)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;sales&lt;/code&gt; table contains a &lt;code&gt;customer_id&lt;/code&gt; column that links to the &lt;code&gt;customer_id&lt;/code&gt; in the &lt;code&gt;customers&lt;/code&gt; table&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;sales&lt;/code&gt; table contains a &lt;code&gt;product_id&lt;/code&gt; column that links to the &lt;code&gt;product_id&lt;/code&gt; in the &lt;code&gt;products&lt;/code&gt; table&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 3: Creating Relationships in Power BI
&lt;/h3&gt;

&lt;p&gt;Power BI automatically detects some relationships, but you may need to create or modify them manually.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F800f0ifzh6qvxhtwq74g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F800f0ifzh6qvxhtwq74g.png" alt="Data Rship" width="800" height="346"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common relationships in this model:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;sales.customer_id&lt;/code&gt; → &lt;code&gt;customers.customer_id&lt;/code&gt; (many-to-one)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sales.product_id&lt;/code&gt; → &lt;code&gt;products.product_id&lt;/code&gt; (many-to-one)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;inventory.product_id&lt;/code&gt; → &lt;code&gt;products.product_id&lt;/code&gt; (one-to-one or many-to-one)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Relationship Cardinality&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each relationship has a &lt;strong&gt;cardinality&lt;/strong&gt; that defines how rows in one table relate to rows in another:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;One-to-Many (1:*)&lt;/strong&gt;: One customer can have many sales transactions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Many-to-One (*:1)&lt;/strong&gt;: Many sales belong to one customer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One-to-One (1:1)&lt;/strong&gt;: One product has one inventory record (less common)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Power BI displays the cardinality on the relationship line with symbols like &lt;code&gt;1&lt;/code&gt; and &lt;code&gt;*&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Relationships Matter&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Properly defined relationships enable Power BI to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Filter correctly across tables&lt;/strong&gt;: When you select a customer, Power BI automatically shows only their sales&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aggregate accurately&lt;/strong&gt;: Calculating total sales per customer requires the relationship between &lt;code&gt;sales&lt;/code&gt; and &lt;code&gt;customers&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoid duplicate counts&lt;/strong&gt;: Without relationships, metrics like total revenue might be calculated incorrectly&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Star schema is the most common modeling pattern and looks like this&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;      Customers&lt;br&gt;
         |&lt;br&gt;
         |&lt;br&gt;
     Sales (Fact Table - Center)&lt;br&gt;
       /   \&lt;br&gt;
      /     \&lt;br&gt;
Products   Inventory&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  

&lt;ol&gt;
&lt;li&gt;Why SQL Skills Are Important for Power BI Analysts
&lt;/li&gt;
&lt;/ol&gt;
&lt;/h2&gt;


&lt;p&gt;Although Power BI provides powerful visualization tools, SQL skills remain essential for analysts. SQL enables them to efficiently retrieve, manipulate, and prepare data stored in relational databases before using it in visualization tools like Power BI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key SQL capabilities include&lt;/strong&gt;:&lt;br&gt;
i) Retrieving Data&lt;br&gt;
Instead of exporting entire datasets, analysts can use queries such as &lt;strong&gt;SELECT&lt;/strong&gt; to extract only the columns and rows relevant to their analysis.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;code&gt;SELECT *&lt;br&gt;
FROM sales&lt;br&gt;
WHERE order_date &amp;gt;= '2025-01-01';&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;ii) Filtering Data&lt;br&gt;
SQL allows analysts to filter large datasets before importing them into Power BI. &lt;br&gt;
Using conditions like &lt;strong&gt;WHERE&lt;/strong&gt;, analysts can narrow results based on specific criteria such as dates, locations, or product categories.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;code&gt;SELECT *&lt;br&gt;
FROM customers&lt;br&gt;
WHERE country = 'Kenya';&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;iii) Aggregating Data&lt;br&gt;
Analysts often need summary insights rather than raw data. &lt;/p&gt;

&lt;p&gt;SQL provides functions such as &lt;strong&gt;SUM(), COUNT(), AVG(), MIN(), and MAX()&lt;/strong&gt; to calculate totals, averages, and other statistical measures.&lt;/p&gt;

&lt;p&gt;When combined with &lt;strong&gt;GROUP BY&lt;/strong&gt;, these functions allow analysts to summarize data by categories such as product type, department, or sales region.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;code&gt;SELECT product_id, SUM(amount) AS total_sales&lt;br&gt;
FROM sales&lt;br&gt;
GROUP BY product_id;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;iv) Preparing Data for Visualization&lt;br&gt;
SQL helps transform raw tables into structured datasets suitable for dashboards.&lt;br&gt;
Examples include:&lt;br&gt;
• Joining tables&lt;br&gt;
• Creating calculated fields&lt;br&gt;
• Building views&lt;br&gt;
• Filtering unnecessary records&lt;/p&gt;

&lt;p&gt;By performing these steps directly within the database, SQL ensures that the data being imported into dashboard tools like Power BI is already clean, organized, and ready for analysis. &lt;/p&gt;

&lt;p&gt;This not only improves performance but also allows analysts to build more accurate and efficient dashboards.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Connecting Power BI to a SQL database allows analysts to move beyond static spreadsheets and work directly with structured, scalable data sources. &lt;/p&gt;

&lt;p&gt;Whether the database is running locally or hosted in the cloud using services like Aiven, the process follows the same process: connect to the database, load the relevant tables, model the relationships, and build visual insights.&lt;/p&gt;

&lt;p&gt;By combining SQL for data preparation and Power BI for visualization, analysts can build powerful dashboards that turn raw data into actionable business insights.&lt;/p&gt;

</description>
      <category>sql</category>
      <category>postgres</category>
      <category>database</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Advanced SQL Techniques Every Data Analyst Should Know</title>
      <dc:creator>Isika Millicent</dc:creator>
      <pubDate>Fri, 06 Mar 2026 16:27:28 +0000</pubDate>
      <link>https://dev.to/isika_millicent/advanced-sql-techniques-every-data-analyst-should-know-nc6</link>
      <guid>https://dev.to/isika_millicent/advanced-sql-techniques-every-data-analyst-should-know-nc6</guid>
      <description>&lt;h2&gt;
  
  
  In this article we will cover:
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Advanced Aggregations&lt;/li&gt;
&lt;li&gt;Advanced Set Operations&lt;/li&gt;
&lt;li&gt;Window Functions&lt;/li&gt;
&lt;li&gt;CTEs and Query Structuring&lt;/li&gt;
&lt;li&gt;Subqueries&lt;/li&gt;
&lt;li&gt;EXISTS vs NOT EXISTS&lt;/li&gt;
&lt;li&gt;Query Optimization&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Most real-world insights come from combining multiple SQL techniques.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;While a basic understanding of SQL e.g. selecting columns, filtering rows, and joining tables is essential for anyone working with data, mastering advanced SQL techniques is what truly separates a new data analyst from an expert data analyst. These advanced skills unlock deeper insights, help solve complex business problems, and ultimately enable more confident, data-driven decision-making.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you have spent any time working with data, you are probably already familiar with the fundamentals of SQL: pulling rows from a table &lt;strong&gt;(SELECT)&lt;/strong&gt;, filtering results using conditions &lt;strong&gt;(WHERE)&lt;/strong&gt;, and perhaps joining two tables together &lt;strong&gt;(JOIN)&lt;/strong&gt;. That foundation is important.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But the moment you start working as a data analyst, the questions change. Suddenly you're asked things like:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Which customers made a purchase this month but haven’t returned since?&lt;br&gt;
For each sales representative, what was their running total of revenue by the end of every quarter?&lt;br&gt;
Which product categories fall within the top 10% of sales performance?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Simple SQL isn't enough anymore. That’s where &lt;strong&gt;advanced SQL techniques&lt;/strong&gt; come in.&lt;/p&gt;
&lt;h2&gt;
  
  
  1. Advanced Aggregations: CASE WHEN, FILTER, and GROUPING SETS
&lt;/h2&gt;

&lt;p&gt;Aggregation in SQL goes far beyond COUNT() and SUM(). &lt;br&gt;
Business reporting often requires conditional aggregation, multiple grouping levels, and flexible summarization.&lt;br&gt;
Three tools make this much easier: CASE WHEN, FILTER &amp;amp; GROUPING SETS&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;a. CASE WHEN&lt;/strong&gt; is used for conditional logic inside a query. It allows you to compute conditional aggregates — for example, counting only orders above a certain value, or separating revenue by customer tier within the same row:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Think of it as SQL’s version of an &lt;code&gt;if/else&lt;/code&gt; statement.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Example:  Categorize customers based on spending.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT customer_name, total_spent,&lt;br&gt;
CASE WHEN total_spent &amp;gt; 1000 THEN 'High Value'&lt;br&gt;
WHEN total_spent &amp;gt; 500 THEN 'Medium Value'&lt;br&gt;
ELSE 'Low Value'&lt;br&gt;
END AS customer_category&lt;br&gt;
FROM customers;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;customer  total_spent  category&lt;br&gt;
Alice        1200   High Value&lt;br&gt;
Brian        650        Medium Value&lt;br&gt;
John         200       Low Value&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;b. FILTER&lt;/strong&gt; -is a cleaner alternative to conditional aggregation.&lt;br&gt;
Instead of writing CASE WHEN, you attach a filter directly to the aggregate.&lt;/p&gt;

&lt;p&gt;Example&lt;br&gt;
&lt;code&gt;SELECT COUNT(*) FILTER (WHERE region = 'Kenya') AS kenya_customers,&lt;br&gt;
COUNT(*) FILTER (WHERE region = 'Uganda') AS uganda_customers&lt;br&gt;
FROM customers;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;c. GROUPING SETS&lt;/strong&gt; - This is more advanced and extremely powerful.&lt;br&gt;
It allows you to calculate multiple GROUP BY aggregations in one query.&lt;br&gt;
Normally this requires three queries + UNION.&lt;/p&gt;

&lt;p&gt;Example&lt;br&gt;
&lt;code&gt;region    product  sales&lt;br&gt;
Kenya   Laptop   200&lt;br&gt;
Kenya   Phone    300&lt;br&gt;
Uganda  Laptop   150&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Suppose you want: Sales by region, Sales by product and the Grand total&lt;br&gt;
Normally you'd write 3 queries.&lt;/p&gt;

&lt;p&gt;But with GROUPING SETS:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT region, product, SUM(sales) AS total_sales&lt;br&gt;
FROM sales&lt;br&gt;
GROUP BY GROUPING SETS ((region), (product),());&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;region      product total_sales&lt;br&gt;
Kenya     NULL     500&lt;br&gt;
Uganda    NULL     150&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This kind of aggregation is frequently requested by business stakeholders who want a single summary table rather than multiple separate queries.&lt;/p&gt;
&lt;h2&gt;
  
  
  2. Advanced Set Operations (UNION, UNION ALL, INTERSECT, EXCEPT)
&lt;/h2&gt;

&lt;p&gt;Set operations combine the results of two SELECT queries.&lt;br&gt;
They require:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Same number of columns&lt;/li&gt;
&lt;li&gt;Compatible data types&lt;/li&gt;
&lt;li&gt;Same column order&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;A. UNION/UNION ALL&lt;/strong&gt;&lt;br&gt;
It combines results from two queries and removes duplicates automatically&lt;/p&gt;

&lt;p&gt;Example Tables&lt;/p&gt;

&lt;p&gt;Table: customers_2024&lt;br&gt;
customer_id name&lt;br&gt;
1           Alice&lt;br&gt;
2           Brian&lt;br&gt;
3           John&lt;/p&gt;

&lt;p&gt;Table: customers_2025&lt;br&gt;
customer_id name&lt;br&gt;
3           John&lt;br&gt;
4           Mary&lt;br&gt;
5          James&lt;/p&gt;

&lt;p&gt;QUERY&lt;br&gt;
&lt;code&gt;SELECT name FROM customers_2024&lt;br&gt;
UNION&lt;br&gt;
SELECT name FROM customers_2025;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;RESULT&lt;br&gt;
name&lt;br&gt;
Alice&lt;br&gt;
Brian&lt;br&gt;
John&lt;br&gt;
Mary&lt;br&gt;
James&lt;/p&gt;

&lt;p&gt;Think of them as working with result sets, not tables directly.&lt;br&gt;
NB/ John appears in both tables &lt;br&gt;
&lt;strong&gt;UNION&lt;/strong&gt; removes the duplicate&lt;br&gt;
&lt;strong&gt;UNION ALL&lt;/strong&gt; keeps all the duplicate values and is faster&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;B. INTERSECT&lt;/strong&gt;&lt;br&gt;
This returns only the rows that exist in both queries&lt;/p&gt;

&lt;p&gt;When to Use INTERSECT&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identifying returning customers&lt;/li&gt;
&lt;li&gt;Matching common records between systems&lt;/li&gt;
&lt;li&gt;Data validation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example use case:&lt;br&gt;
Customers who bought in both 2024 and 2025. Using the same tables:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT name FROM customers_2024&lt;br&gt;
INTERSECT&lt;br&gt;
SELECT name FROM customers_2025;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Result&lt;br&gt;
name&lt;br&gt;
John&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;C. EXCEPT&lt;/strong&gt;&lt;br&gt;
It returns rows from the first query that do NOT exist in the second query.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Think of it as: “Show me what's in A but not in B”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;code&gt;SELECT name FROM customers_2024&lt;br&gt;
EXCEPT&lt;br&gt;
SELECT name FROM customers_2025;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Result&lt;br&gt;
name&lt;br&gt;
Alice&lt;br&gt;
Brian&lt;/p&gt;

&lt;p&gt;When to Use EXCEPT&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Finding new customers&lt;/li&gt;
&lt;li&gt;Detecting missing records&lt;/li&gt;
&lt;li&gt;Comparing two datasets&lt;/li&gt;
&lt;li&gt;Data auditing&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Question: When would you use EXCEPT instead of LEFT JOIN?&lt;br&gt;
I use EXCEPT when I want to compare two result sets directly and identify records present in one dataset but not another, especially for data validation tasks.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  3. Window Functions
&lt;/h2&gt;

&lt;p&gt;Window functions are one of the most powerful tools in SQL analytics.&lt;br&gt;
They allow you to perform calculations across rows while still keeping every row in the result.&lt;/p&gt;

&lt;p&gt;Example problem: You want to see each employee's sales alongside their department's total sales.&lt;/p&gt;

&lt;p&gt;A normal GROUP BY would collapse rows.&lt;/p&gt;

&lt;p&gt;Window functions avoid that.&lt;/p&gt;

&lt;p&gt;The syntax centers on the OVER() clause, which defines the 'window' — the set of rows the function considers:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT employee_name, department, sales_amount,&lt;br&gt;
SUM(sales_amount) OVER (PARTITION BY department) AS dept_total,&lt;br&gt;
ROUND(sales_amount * 100.0 / SUM(sales_amount) &lt;br&gt;
OVER (PARTITION BY department), 2) &lt;br&gt;
AS pct_of_dept&lt;br&gt;
FROM employee_sales;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Here, PARTITION BY department tells SQL to calculate the sum separately for each department. &lt;br&gt;
Every row for the Marketing team gets Marketing's total; every row for Engineering gets Engineering's total. No rows are removed; no separate subquery is needed.&lt;/p&gt;
&lt;h3&gt;
  
  
  Window functions become even more powerful with ranking.
&lt;/h3&gt;

&lt;p&gt;The ROW_NUMBER (), RANK (), and DENSE_RANK () functions let you rank records within groups.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ROW_NUMBER() - gives every row a unique number.
Even if two rows have the same value, they still get different numbers.&lt;/li&gt;
&lt;li&gt;RANK() - gives the same rank to tied values, but it skips numbers after ties.&lt;/li&gt;
&lt;li&gt;DENSE_RANK() - also gives the same rank for ties, but does NOT skip numbers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example table: employees&lt;br&gt;
&lt;code&gt;name    salary&lt;br&gt;
Alice     9000&lt;br&gt;
Bob   8000&lt;br&gt;
Carol     8000&lt;br&gt;
David     7000&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Query 1: Row_number()&lt;br&gt;
&lt;code&gt;SELECT name, salary,&lt;br&gt;
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num&lt;br&gt;
FROM employees;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Result&lt;br&gt;
&lt;code&gt;name  salary  row_num&lt;br&gt;
Alice   9000    1&lt;br&gt;
Bob 8000    2&lt;br&gt;
Carol   8000    3&lt;br&gt;
David   7000    4&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;NB/&lt;br&gt;
Bob and Carol have the same salary, but they still get different row numbers&lt;br&gt;
So, ROW_NUMBER() forces each row to be unique.&lt;/p&gt;

&lt;p&gt;Query 2: RANK()&lt;br&gt;
&lt;code&gt;SELECT name, salary,&lt;br&gt;
RANK() OVER (ORDER BY salary DESC) AS rank_num&lt;br&gt;
FROM employees;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Result&lt;br&gt;
&lt;code&gt;name  salary  rank_num&lt;br&gt;
Alice   9000    1&lt;br&gt;
Bob 8000    2&lt;br&gt;
Carol   8000    2&lt;br&gt;
David   7000    4&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;NB/&lt;br&gt;
Bob and Carol tie at rank 2,then SQL skips rank 3 and goes to 4.&lt;/p&gt;

&lt;p&gt;Query 3: DENSE_RANK()&lt;br&gt;
&lt;code&gt;SELECT name, salary,&lt;br&gt;
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank&lt;br&gt;
FROM employees;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Result&lt;br&gt;
&lt;code&gt;name  salary  dense_rank&lt;br&gt;
Alice   9000    1&lt;br&gt;
Bob 8000    2&lt;br&gt;
Carol   8000    2&lt;br&gt;
David   7000    3&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;NB/ No number is skipped.&lt;/p&gt;

&lt;p&gt;This is just one example of the power of window functions. They can be used for other analytical tasks, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ranking: NTILE () to find the top N products by sales in each category.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Lead/Lag Analysis: LEAD (), LAG () to compare a value with the value from a subsequent or preceding row, for example, to calculate month-over-month growth.&lt;br&gt;
Example &lt;br&gt;
&lt;code&gt;SELECT month, revenue, revenue - LAG(revenue) OVER (ORDER BY month) AS growth&lt;br&gt;
FROM monthly_sales;&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Running Totals: SUM() with an appropriate frame clause to calculate cumulative sales over time. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;A classic business use case is identifying the top-performing product in each category&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;code&gt;SELECT *&lt;br&gt;
FROM (SELECT product_name, category, revenue,&lt;br&gt;
RANK() OVER (PARTITION BY category ORDER BY revenue DESC) AS rnk&lt;br&gt;
FROM product_sales)&lt;br&gt;
WHERE rnk = 1;&lt;/code&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  4. Common Table Expressions (CTEs)
&lt;/h2&gt;

&lt;p&gt;A CTE is a temporary, named result set that you can reference within a subsequent SELECT, INSERT, UPDATE, or DELETE statement. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Think of it as creating a temporary, virtual table that exists only for the duration of your query.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;CTEs provide a way to break down a complex query into logical, readable steps. By giving a name to each step of your analysis, you can make your code self-documenting and easier to debug.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Understanding the Syntax&lt;/strong&gt;&lt;br&gt;
You start with the &lt;strong&gt;WITH&lt;/strong&gt; keyword, followed by the name of the CTE, and then the &lt;strong&gt;AS&lt;/strong&gt; keyword with the query that defines the CTE enclosed in parentheses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Business Case Scenario: Analyzing Customer Order Data&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Let's say you are an analyst for an online retailer, and you have been tasked with identifying the top 5 customers by total spending in the "Kenyan" region for the year 2025. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Attempting to do this with nested subqueries would result in a hard-to-read query. Instead, we can use CTEs to break down the problem into clear, logical steps.&lt;/p&gt;

&lt;p&gt;Assume you have the following tables:&lt;/p&gt;

&lt;p&gt;Table 1: customers&lt;br&gt;
`customer_id     customer_name    region&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;              John Smith    Kenya&lt;/li&gt;
&lt;li&gt;              Jane Doe      Uganda`&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Table 2: orders&lt;br&gt;
&lt;code&gt;order_id  customer_id order_date  order_total&lt;br&gt;
101                1    2025-01-15  250.00&lt;br&gt;
102                2    2025-02-20  150.75&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Here is how you can solve this problem using CTEs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WITH orders_2025 AS (
    --Step 1: Filter orders from the year 2025--
    SELECT order_id, customer_id, order_total
    FROM orders
    WHERE EXTRACT(YEAR FROM order_date) = 2025),

kenya_customer_orders AS (
    --Step 2 &amp;amp; 3: Join with customers and filter for Kenyan region--
    SELECT c.customer_id, c.customer_name, o.order_total
    FROM orders_2025 AS o
    JOIN customers AS c ON o.customer_id = c.customer_id
    WHERE c.region = 'Kenya'),

customer_total_spending AS (
     --Step 4: Calculate total spending for each customer--
    SELECT customer_id, customer_name, 
    SUM(order_total) AS total_spending
    FROM kenya_customer_orders
    GROUP BY customer_id, customer_name)
 _Step 5: Rank customers and select the top 5_
SELECT customer_name, total_spending
FROM customer_total_spending
ORDER BY total_spending DESC
LIMIT 5;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As you can see, the query is much more readable and self-explanatory. Each CTE has a clear purpose, and the final SELECT statement is simple and easy to understand. If you needed to debug this query, you could easily test each CTE independently to verify its output&lt;/p&gt;

&lt;p&gt;CTEs also support recursion, which is useful for hierarchical data structures like org charts, product categories with parent-child relationships, or network graphs. &lt;/p&gt;

&lt;p&gt;Example Table &lt;br&gt;
CEO&lt;br&gt;
 ├─ Alice&lt;br&gt;
 │   ├─ Carol&lt;br&gt;
 │   └─ David&lt;br&gt;
 └─ Bob&lt;br&gt;
     └─ Emma&lt;/p&gt;

&lt;p&gt;Question: Find all employees under a specific manager.&lt;/p&gt;

&lt;p&gt;employee_id name    manager_id&lt;br&gt;
1             CEO   NULL&lt;br&gt;
2           Alice   1&lt;br&gt;
3           Bob 1&lt;br&gt;
4           Carol   2&lt;br&gt;
5           David   2&lt;br&gt;
6           Emma    3&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WITH RECURSIVE employee_tree AS (
    -- start from manager
SELECT employee_id, name, manager_id
FROM employees
WHERE employee_id = 1
    UNION ALL
    -- find subordinates
SELECT e.employee_id, e.name, e.manager_id
FROM employees e
JOIN employee_tree et
ON e.manager_id = et.employee_id)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;SELECT *&lt;br&gt;
FROM employee_tree;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Example: Sales Analytics Query&lt;/p&gt;

&lt;p&gt;Imagine this table:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;order_id  customer    region    order_total   order_date&lt;br&gt;
1              Alice    Kenya        500    2025-01-05&lt;br&gt;
2              Brian    Kenya        700    2025-02-02&lt;br&gt;
3              James    Uganda       300    2025-02-15&lt;br&gt;
4              Alice    Kenya        400    2025-03-01&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Goal:&lt;br&gt;
Total spending per customer&lt;br&gt;
Kenyan vs non-Kenyan totals&lt;br&gt;
Rank customers by spending&lt;/p&gt;

&lt;p&gt;&lt;em&gt;--Step 1 — CTE (clean and organize the orders from 2025) --&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;WITH customer_orders AS (&lt;br&gt;
SELECT customer, region, order_total&lt;br&gt;
FROM orders&lt;br&gt;
WHERE EXTRACT(YEAR FROM order_date) = 2025)&lt;/p&gt;

&lt;p&gt;&lt;em&gt;--Step 2 — FILTER (conditional aggregation)--&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;SELECT customer, SUM(order_total) AS total_spent,&lt;br&gt;
   SUM(order_total) FILTER (WHERE region = 'Kenya') AS kenya_spending,&lt;br&gt;
    SUM(order_total) FILTER (WHERE region != 'Kenya') AS international_spending&lt;/p&gt;

&lt;p&gt;&lt;em&gt;--Without FILTER, you'd need multiple CASE WHEN statements.--&lt;br&gt;
--Step 3 — Window function (ranking)--&lt;/em&gt;&lt;br&gt;
RANK() OVER (ORDER BY SUM(order_total) DESC) AS spending_rank&lt;/p&gt;

&lt;p&gt;--This ranks customers based on their total spending.--&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;--Final Combined Query--&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WITH customer_orders AS (
SELECT customer, region, order_total
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2025)

SELECT customer, SUM(order_total) AS total_spent,
SUM(order_total) FILTER (WHERE region = 'Kenya') AS kenya_spending,
SUM(order_total) FILTER (WHERE region != 'Kenya') AS international_spending,
RANK() OVER (ORDER BY SUM(order_total) DESC) AS spending_rank

FROM customer_orders
GROUP BY customer;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of writing 5 queries, you get everything in one query.&lt;/p&gt;

&lt;p&gt;Mental model - The flow usually looks like this:&lt;br&gt;
&lt;code&gt;CTE&lt;br&gt;
↓&lt;br&gt;
Aggregate with FILTER&lt;br&gt;
↓&lt;br&gt;
Analyze with WINDOW FUNCTIONS&lt;br&gt;
↓&lt;br&gt;
Final output&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Subquery
&lt;/h2&gt;

&lt;p&gt;This is simply a query inside another query.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It’s like saying: First calculate this… then use that result to answer the main question.&lt;br&gt;
They can be used in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SELECT&lt;/li&gt;
&lt;li&gt;FROM&lt;/li&gt;
&lt;li&gt;WHERE&lt;/li&gt;
&lt;li&gt;HAVING&lt;/li&gt;
&lt;li&gt;CORRELATED SUBQUERIES&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;a. Subquery in the WHERE Clause (Most Common)&lt;/strong&gt;&lt;br&gt;
Used for filtering.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
Find employees who earn more than the average salary.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT name, salary&lt;br&gt;
FROM employees&lt;br&gt;
WHERE salary &amp;gt; (SELECT AVG(salary) FROM employees);&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;So first, Inner query calculates average salary. Outer query filters employees above that value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;b. Subquery in the FROM Clause (Derived Table)&lt;/strong&gt;&lt;br&gt;
Here, the subquery acts like a temporary table.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;Find departments with average salary above 50,000.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT department_id, avg_salary&lt;br&gt;
FROM (&lt;br&gt;
   SELECT department_id, AVG(salary) AS avg_salary&lt;br&gt;
   FROM employees&lt;br&gt;
   GROUP BY department_id) AS dept_avg&lt;br&gt;
WHERE avg_salary &amp;gt; 50000;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;c. Subquery in the SELECT Clause&lt;/strong&gt;&lt;br&gt;
Used to return a calculated value per row.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
Show each employee with the company average salary.&lt;br&gt;
&lt;code&gt;SELECT name, salary,&lt;br&gt;
(SELECT AVG(salary) FROM employees) AS company_avg&lt;br&gt;
FROM employees;&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;d. Subquery in the HAVING Clause *&lt;/em&gt;&lt;br&gt;
It is used when you want to filter aggregated results using another query.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;WHERE filters rows before aggregation&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;HAVING filters groups after aggregation&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So a subquery in HAVING usually compares one group's aggregate with another aggregate value. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT department_id, AVG(salary) AS dept_avg&lt;br&gt;
FROM employees&lt;br&gt;
GROUP BY department_id&lt;br&gt;
HAVING AVG(salary) &amp;gt; (&lt;br&gt;
    SELECT AVG(salary)&lt;br&gt;
    FROM employees);&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Real Business Case: Find products whose total sales are greater than the average product sales.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT product_id, SUM(sales) AS total_sales&lt;br&gt;
FROM orders&lt;br&gt;
GROUP BY product_id&lt;br&gt;
HAVING SUM(sales) &amp;gt; (SELECT AVG(product_sales)&lt;br&gt;
    FROM(SELECT SUM(sales) AS product_sales&lt;br&gt;
        FROM orders&lt;br&gt;
        GROUP BY product_id) t);&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;e. Correlated Subqueries&lt;/strong&gt;&lt;br&gt;
A correlated subquery depends on the outer query. It references a column from the outer query and runs once per row.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
Find employees who earn more than their department average.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT e1.name, e1.salary, e1.department_id&lt;br&gt;
FROM employees e1&lt;br&gt;
WHERE salary &amp;gt; (&lt;br&gt;
    SELECT AVG(e2.salary)&lt;br&gt;
    FROM employees e2&lt;br&gt;
    WHERE e2.department_id = e1.department_id&lt;br&gt;
);&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Finding Relationships with EXISTS and NOT EXISTS
&lt;/h2&gt;

&lt;p&gt;EXISTS checks if at least one row exists. It ignores the actual column values.&lt;/p&gt;

&lt;p&gt;Use the 2 when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Checking relationships&lt;/li&gt;
&lt;li&gt;Finding missing records&lt;/li&gt;
&lt;li&gt;Data validation&lt;/li&gt;
&lt;li&gt;Complex filtering&lt;/li&gt;
&lt;li&gt;Working with large datasets&lt;/li&gt;
&lt;li&gt;Avoiding NULL issues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example &lt;br&gt;
Table 1: customers&lt;br&gt;
&lt;code&gt;customer_id   customer_name&lt;br&gt;
1            Alice&lt;br&gt;
2            Bob&lt;br&gt;
3            Carol&lt;br&gt;
4            David&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Table 2: orders&lt;/p&gt;

&lt;p&gt;&lt;code&gt;order_id   customer_id    amount&lt;br&gt;
101         1    500&lt;br&gt;
102         1        300&lt;br&gt;
103         2    700&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Now let’s use them.&lt;br&gt;
&lt;strong&gt;i. EXISTS&lt;/strong&gt;&lt;br&gt;
Find customers who have placed at least one order.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT c.customer_id, c.customer_name&lt;br&gt;
FROM customers c&lt;br&gt;
WHERE EXISTS (&lt;br&gt;
 SELECT 1&lt;br&gt;
 FROM orders o&lt;br&gt;
 WHERE o.customer_id = c.customer_id);&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Result:&lt;br&gt;
&lt;code&gt;customer_id   customer_name&lt;br&gt;
1           Alice&lt;br&gt;
2           Bob&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ii. NOT EXISTS&lt;/strong&gt;&lt;br&gt;
Now let’s find customers who NEVER placed an order.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT c.customer_id, c.customer_name&lt;br&gt;
FROM customers c&lt;br&gt;
WHERE NOT EXISTS (&lt;br&gt;
    SELECT 1&lt;br&gt;
    FROM orders o&lt;br&gt;
    WHERE o.customer_id = c.customer_id);&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;NOTE BETTER: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The select 1 in both just stands in as a placeholder, you can replace it with any value. It does NOT read the column&lt;/li&gt;
&lt;li&gt;NOT EXISTS (recommended) OR LEFT JOIN + IS NULL - give same result&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  7. Query Optimization
&lt;/h2&gt;

&lt;p&gt;Writing SQL that produces the correct answer is one thing.&lt;br&gt;
Writing SQL that runs efficiently on millions of rows is another skill entirely.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Query optimization is the process of writing SQL in a way that runs faster, uses less memory, and scales better on large data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Key techniques:&lt;/p&gt;

&lt;p&gt;If you frequently filter or join on a column, it’s likely a good candidate.&lt;br&gt;
Example:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SELECT *&lt;br&gt;
FROM orders o&lt;br&gt;
JOIN customers c&lt;br&gt;
ON o.customer_id = c.customer_id;&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;customer_id should likely be indexed.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;How Databases Actually Work&lt;/strong&gt;&lt;br&gt;
When you run a query:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SQL parses it (checks syntax).&lt;/li&gt;
&lt;li&gt;The query optimizer analyzes it.&lt;/li&gt;
&lt;li&gt;It chooses the best execution plan.&lt;/li&gt;
&lt;li&gt;Then it runs it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The optimizer decides things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Should it use an index?&lt;/li&gt;
&lt;li&gt;Should it scan the whole table?&lt;/li&gt;
&lt;li&gt;Which join order is best?&lt;/li&gt;
&lt;li&gt;Should it use a hash join or nested loop?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don’t see this — but it’s happening.&lt;/p&gt;

&lt;p&gt;Key principles include:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use EXPLAIN or EXPLAIN ANALYZE
This reveals how the database executes your query.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;2. Use indexes on frequently filtered columns&lt;/strong&gt;&lt;br&gt;
Indexes are like a book’s table of contents. Without index, database scans every row (slow) but with index, database jumps directly to matching rows (fast).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Avoid SELECT * in production queries&lt;/strong&gt;&lt;br&gt;
Just select necessary columns because less data transferred and less memory used hence faster execution&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Filter early using WHERE clauses&lt;/strong&gt;&lt;br&gt;
Filtering early reduces the number of rows processed.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Avoid functions on indexed columns&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example:&lt;br&gt;
Works but not as good:&lt;br&gt;
&lt;code&gt;WHERE YEAR(order_date) = 2024&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Better:&lt;br&gt;
&lt;code&gt;WHERE order_date &amp;gt;= '2024-01-01'&lt;br&gt;
AND order_date &amp;lt; '2025-01-01'&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;NB/ Indexes are NOT helpful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small tables&lt;/li&gt;
&lt;li&gt;Queries returning most of the table&lt;/li&gt;
&lt;li&gt;Columns with very low uniqueness (like gender: M/F)&lt;/li&gt;
&lt;li&gt;Heavy writes (because indexes slow down inserts/updates)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Discussion
&lt;/h2&gt;

&lt;p&gt;What SQL concept took you the longest to understand?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Window functions?&lt;/li&gt;
&lt;li&gt;CTEs?&lt;/li&gt;
&lt;li&gt;Subqueries?&lt;/li&gt;
&lt;li&gt;Something else?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let me know. &lt;/p&gt;

</description>
      <category>sql</category>
      <category>dataanalytics</category>
      <category>database</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Why Data Modeling Matters in Power BI: A Beginner’s Guide to Schemas, Facts, and Dimensions</title>
      <dc:creator>Isika Millicent</dc:creator>
      <pubDate>Sun, 01 Feb 2026 22:53:11 +0000</pubDate>
      <link>https://dev.to/isika_millicent/why-data-modeling-matters-in-power-bi-a-beginners-guide-to-schemas-facts-and-dimensions-4njl</link>
      <guid>https://dev.to/isika_millicent/why-data-modeling-matters-in-power-bi-a-beginners-guide-to-schemas-facts-and-dimensions-4njl</guid>
      <description>&lt;h2&gt;
  
  
  Power BI Is Not About Charts - It’s About Data Modeling
&lt;/h2&gt;

&lt;p&gt;When people first open Power BI, they often focus on charts, dashboards, and visuals. That’s understandable, visuals are what you see first.&lt;br&gt;
But the real power of Power BI doesn’t come from visuals. It comes from how your data is structured behind the scenes.&lt;/p&gt;

&lt;p&gt;Power BI is first and foremost a data modeling engine. Visuals are only the surface layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Data Modeling in Power BI?
&lt;/h2&gt;

&lt;p&gt;Data modeling is the process of organizing your data so Power BI can understand it efficiently.&lt;/p&gt;

&lt;p&gt;A good data model gives Power BI a logical map of your data. It tells the engine: “These are numbers. These are descriptions. This is how they connect.”&lt;/p&gt;

&lt;p&gt;Think of data modeling as the architectural blueprint of a building. If the foundation is weak, the building may look fine at first, but it won’t last. The same is true for Power BI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Essential foundations of data modeling in simple terms, focusing on:
&lt;/h2&gt;

&lt;h3&gt;
  
  
  A. Star schema &amp;amp; Snowflake schema
&lt;/h3&gt;

&lt;p&gt;What Is a Schema?&lt;br&gt;
A schema is the structure or blueprint of how data is organized in a system. It answers one core question: &lt;em&gt;How should Power BI understand this data?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The schema is the logic behind your report. It’s the map that shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What tables exist&lt;/li&gt;
&lt;li&gt;What columns are in each table&lt;/li&gt;
&lt;li&gt;How tables connect to each other&lt;/li&gt;
&lt;li&gt;How data is arranged&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A schema does not hold the data itself. It defines how the data is organized.&lt;/p&gt;

&lt;p&gt;Instead of one giant table, the data is split into logical pieces. This structure allows Power BI to work efficiently and correctly.&lt;/p&gt;

&lt;p&gt;i) Star schema&lt;br&gt;
When fact and dimension tables connect correctly, they form a star-shaped structure.&lt;/p&gt;

&lt;p&gt;This is called a star schema. The schema is literally the shape of your model.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpbkuobz5z2txoqw0a0qk.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpbkuobz5z2txoqw0a0qk.webp" alt="Star schema model" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The fact table sits in the center. Dimension tables surround it. This layout is simple, clean, and optimized for analytics. It is the preferred structure for most Power BI models because it improves performance and makes reporting more predictable.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2i3983vkw55to5od9oow.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2i3983vkw55to5od9oow.png" alt="Star schema example" width="800" height="546"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each dimension table contains descriptive data used for filtering, grouping, and slicing (e.g., product names, customer regions).&lt;/p&gt;

&lt;p&gt;ii) Snowflake Schema&lt;/p&gt;

&lt;p&gt;A Snowflake Schema is like a star schema but with dimension tables normalized — i.e., broken into several linked tables. &lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Product Dimension → Product → Product Category → Product Brand&lt;/li&gt;
&lt;li&gt;Location Dimension → City → Region → Country&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyn8a9s3chjvhyjdw6gxk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyn8a9s3chjvhyjdw6gxk.png" alt="Snowflake schema model" width="800" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Instead of one dimension table, dimensions are split into multiple linked tables. While this reduces redundancy, it introduces additional complexity. Snowflake schemas are still valid — they are simply more structured and less straightforward than star schemas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;NOTE BETTER&lt;/strong&gt;&lt;br&gt;
Power BI performs best with star schemas because &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Fast Query Performance – Fewer joins improve speed because Power BI’s engine runs aggregation queries efficiently.&lt;/li&gt;
&lt;li&gt;Simpler DAX &amp;amp; Reporting – Clear structure makes writing measures and visuals straightforward.&lt;/li&gt;
&lt;li&gt;Easy for Users to Understand – Business users and analysts can use models intuitively.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Why Schemas Matter&lt;/strong&gt;&lt;br&gt;
Without a schema:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Power BI guesses relationships&lt;/li&gt;
&lt;li&gt;totals become unreliable&lt;/li&gt;
&lt;li&gt;filtering breaks&lt;/li&gt;
&lt;li&gt;performance slows&lt;/li&gt;
&lt;li&gt;reports become confusing&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With a good schema:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;numbers aggregate correctly&lt;/li&gt;
&lt;li&gt;filters behave logically&lt;/li&gt;
&lt;li&gt;dashboards load faster&lt;/li&gt;
&lt;li&gt;reports scale cleanly&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Real-World Example of a Schema&lt;/strong&gt;&lt;br&gt;
Imagine a retail company that sells products online. Management wants to analyze sales by customer, product, and time.&lt;/p&gt;

&lt;p&gt;At first, the company stores everything in one giant spreadsheet:&lt;/p&gt;

&lt;p&gt;Order ID | Date | Customer | City | Product | Category | Revenue | Quantity&lt;/p&gt;

&lt;p&gt;This works when the business is small. But as sales grow into millions of rows, problems appear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repeated customer names&lt;/li&gt;
&lt;li&gt;repeated product descriptions&lt;/li&gt;
&lt;li&gt;slow performance&lt;/li&gt;
&lt;li&gt;inconsistent calculations&lt;/li&gt;
&lt;li&gt;hard-to-maintain reports&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  B. Fact and dimension tables
&lt;/h3&gt;

&lt;p&gt;Every strong Power BI model follows one fundamental principle: &lt;em&gt;Separate numbers from descriptions.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;1) Fact Table:&lt;br&gt;
Holds numeric/ measurable metrics (e.g., sales, revenue, quantity, cost, transactions etc).&lt;br&gt;
Fact tables tend to be large because businesses generate many events. They answer one question: What happened?&lt;/p&gt;

&lt;p&gt;Example fact table:&lt;br&gt;
&lt;code&gt;OrderID | DateID | CustomerID | ProductID | Revenue | Quantity&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;2) Dimension Tables:&lt;br&gt;
Contain descriptive attributes (e.g., Product Name, Customer City, customer names, product categories, regions, dates etc).&lt;br&gt;
Are usually smaller and denormalized (not split into highly normalized tables).&lt;/p&gt;

&lt;p&gt;They provide context and answer:_ Who? What? Where? When?_&lt;/p&gt;

&lt;p&gt;Dimension tables drive filtering and grouping in reports.&lt;/p&gt;

&lt;p&gt;Examples:&lt;br&gt;
Customers → name, city&lt;br&gt;
Products → product name, category&lt;br&gt;
Dates → year, month, day&lt;/p&gt;

&lt;p&gt;These dimension tables connect to the fact table through IDs.&lt;/p&gt;

&lt;h3&gt;
  
  
  C. Relationships
&lt;/h3&gt;

&lt;p&gt;Relationships define how tables connect and how filters propagate between them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Concepts&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cardinality:
a) One-to-Many (1:*): Most common and preferred (one dimension value → many facts).
b) One-to-One (1:1): Rare; usually when two tables have exactly matching rows.
c) Many-to-Many (:): Possible,but can cause incorrect aggregations and slow performance. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;NOTE BETTER:&lt;/strong&gt;&lt;br&gt;
Active vs. Inactive Relationships:&lt;br&gt;
Power BI allows multiple relationships between tables, but only one can be active by default. Inactive ones can be used with DAX functions.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Filter Direction:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Usually single-direction — dimension filters fact data.&lt;br&gt;
Bi-directional can be used sparingly, but increases complexity and processing. &lt;/p&gt;

&lt;h3&gt;
  
  
  D. Why modeling is critical for performance and accurate reporting
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Why Good Modeling Is Critical&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A poor model can cause:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Messy Formulas&lt;/li&gt;
&lt;li&gt;Duplicated counts&lt;/li&gt;
&lt;li&gt;Slow dashboards&lt;/li&gt;
&lt;li&gt;Broken filters&lt;/li&gt;
&lt;li&gt;Confusing visuals &lt;/li&gt;
&lt;li&gt;Unreliable insights&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Therefore, before building visuals, always:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identify the fact table (numbers/events)&lt;/li&gt;
&lt;li&gt;Identify dimension tables (descriptions)&lt;/li&gt;
&lt;li&gt;Create clean one-to-many relationships&lt;/li&gt;
&lt;li&gt;Remove duplicate or unnecessary tables&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Model first. Visualize second&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you remember only one rule, remember this:&lt;br&gt;
&lt;strong&gt;Separate numbers from descriptions and connect them in a clean star schema.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That single principle solves most beginner problems in Power BI. It prevents broken filters, incorrect totals, and slow reports. It turns Power BI from a charting tool into a reliable analytics engine.&lt;/p&gt;

&lt;p&gt;That’s where real Power BI begins.&lt;/p&gt;

</description>
      <category>powerbi</category>
      <category>datamodelling</category>
      <category>beginners</category>
    </item>
    <item>
      <title>How MS Excel Can Be Used for Basic Data Analysis – Beginner-Friendly Guide</title>
      <dc:creator>Isika Millicent</dc:creator>
      <pubDate>Sun, 25 Jan 2026 19:47:30 +0000</pubDate>
      <link>https://dev.to/isika_millicent/how-ms-excel-can-be-used-for-basic-data-analysis-beginner-friendly-guide-2gam</link>
      <guid>https://dev.to/isika_millicent/how-ms-excel-can-be-used-for-basic-data-analysis-beginner-friendly-guide-2gam</guid>
      <description>&lt;p&gt;Imagine Excel as a super-smart notebook. You can type numbers, words, or dates in it, but it can also do math, spot patterns, and summarize information for you. That’s what “data analysis” is all about: making sense of data.&lt;/p&gt;

&lt;p&gt;Here’s how it can be used for basic data analysis:&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Organize Data
&lt;/h2&gt;

&lt;p&gt;In Excel, each piece of information goes into a cell (&lt;em&gt;intersection between rows and columns&lt;/em&gt;).&lt;br&gt;
Use columns for categories (like “Name,” “Age,” or “Sales”) and rows for individual entries (like each person or each sale).&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Sort and Filter
&lt;/h2&gt;

&lt;p&gt;Sort helps you arrange your data from smallest to largest, or alphabetically.&lt;br&gt;
Filter lets you see only the data you care about (like all sales above 200).&lt;br&gt;
This makes it easy to spot trends or specific entries.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzaqtyraf64woqr2mlacl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzaqtyraf64woqr2mlacl.png" alt="Sort &amp;amp; Filter" width="503" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Basic Calculations
&lt;/h2&gt;

&lt;p&gt;Excel can do math automatically. You can use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Sum&lt;/em&gt; (=SUM(B2:B10)) → adds up numbers in a column.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Average&lt;/em&gt; (=AVERAGE(B2:B10)) → finds the mean.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Min / Max&lt;/em&gt; (=MIN(B2:B10) or =MAX(B2:B10)) → finds the smallest or largest number.
Example: To get total sales from the table above, you’d type =SUM(C2:C4) → gives 650.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  4. Use Basic Functions for Quick Insights
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;COUNT&lt;/em&gt; (=COUNT(A2:A10)) → counts how many entries there are.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;em&gt;COUNTIF&lt;/em&gt; (=COUNTIF(B2:B10, "&amp;gt;25")) → counts entries meeting a condition.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These help summarize data quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Visualize Data with Charts
&lt;/h2&gt;

&lt;p&gt;Excel can turn numbers into charts to make them easier to understand:&lt;br&gt;
Bar charts → compare amounts.&lt;br&gt;
Line charts → show trends over time.&lt;br&gt;
Pie charts → show parts of a whole.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcxpdky3eedb810dtnad.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhcxpdky3eedb810dtnad.png" alt="Data Visualization using charts" width="800" height="317"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Spot Trends with Conditional Formatting
&lt;/h2&gt;

&lt;p&gt;You can highlight cells that meet certain conditions.&lt;br&gt;
Example: Highlight all sales over 200 → Excel can color those cells automatically.&lt;br&gt;
This helps you see patterns without reading every number.&lt;/p&gt;

&lt;h1&gt;
  
  
  Data Cleaning in Excel (Beginner-Friendly Guide)
&lt;/h1&gt;

&lt;p&gt;Raw data is almost always messy. Excel helps you clean it without needing coding skills.&lt;/p&gt;

&lt;p&gt;1.Removing Blank Cells &amp;amp; Rows&lt;br&gt;
Blank rows can confuse Excel when analyzing data.&lt;/p&gt;

&lt;p&gt;How to do it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select your data&lt;/li&gt;
&lt;li&gt;Go to Data → Filter&lt;/li&gt;
&lt;li&gt;Click the filter arrow → uncheck (Blanks)&lt;/li&gt;
&lt;li&gt;Delete the blank rows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;2.Fixing Inconsistent Text&lt;/p&gt;

&lt;p&gt;Useful tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;UPPER() → makes text all caps&lt;/li&gt;
&lt;li&gt;LOWER() → makes text lowercase&lt;/li&gt;
&lt;li&gt;PROPER() → capitalizes first letters&lt;/li&gt;
&lt;li&gt;TRIM() → removes extra spaces&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example problems:&lt;br&gt;
“Nairobi”, “nairobi”, “NAIROBI”&lt;br&gt;
&lt;code&gt;=TRIM(PROPER(A2))&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;3.Removing Duplicates&lt;br&gt;
Duplicate data can give wrong totals and averages.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl0wljcjg2wib0y1lammx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl0wljcjg2wib0y1lammx.png" alt="Remove Duplicates" width="800" height="212"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select your data&lt;/li&gt;
&lt;li&gt;Go to Data → Remove Duplicates&lt;/li&gt;
&lt;li&gt;Choose the column(s)&lt;/li&gt;
&lt;li&gt;Click OK&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;4.Splitting Data into Columns&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
Millicent Isika → First Name | Last Name&lt;/p&gt;

&lt;p&gt;Steps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select the column&lt;/li&gt;
&lt;li&gt;Go to Data → Text to Columns&lt;/li&gt;
&lt;li&gt;Choose Delimited&lt;/li&gt;
&lt;li&gt;Select space or comma&lt;/li&gt;
&lt;li&gt;Finish&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Finally: Building a Simple Dashboard in Excel
&lt;/h2&gt;

&lt;p&gt;A simple Excel dashboard can:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Track performance&lt;/li&gt;
&lt;li&gt;Show trends&lt;/li&gt;
&lt;li&gt;Support decisions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4z1x6jg1u5vkbhr4u7vm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4z1x6jg1u5vkbhr4u7vm.png" alt="Dashboard" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Step 1: A good dashboard starts with clean data.&lt;br&gt;
Your data should:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Have clear column headers&lt;/li&gt;
&lt;li&gt;No blank rows&lt;/li&gt;
&lt;li&gt;One type of data per column&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Step 2: Convert data into a table. This makes your dashboard dynamic.&lt;br&gt;
Step 3: Create Key Metrics (KPIs)&lt;br&gt;
Step 4: Create Charts &amp;amp; Slicers&lt;br&gt;
Step 5: Arrange the Dashboard Layout&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>microsoft</category>
      <category>data</category>
    </item>
  </channel>
</rss>
