DEV Community: Isika Millicent

Git Demystified: How I Learned Git, Git Bash, GitHub, and Terminals as a Complete Beginner

Isika Millicent — Wed, 11 Mar 2026 09:47:23 +0000

When I first started learning version control, I kept seeing four things mentioned everywhere: Git, GitHub, Git Bash, and different terminals like Anaconda Prompt or the CMD

And honestly, I was confused.

Not the kind of confusion where you do not understand a command. The kind where you do not even understand where you are supposed to type the command in the first place.

Where the Confusion Started

Let me share what actually happened.

When I first started learning Git, I expected it to be straightforward.

Install Git.
Open Git Bash.
Run the commands.
Done

Except Git Bash was not working properly on my machine.

I would open it and immediately get a red error message. Something about a child process and DLL rebasing. I had no idea what any of that meant. I tried running it as Administrator. Still broken. I searched online. Got more confused. Closed the terminal. Opened it again. Same error.

I spent more time trying to fix Git Bash than I spent actually learning Git. That felt wrong, but I did not know what else to do.

And here is what made it worse: Every single tutorial I found started with the same two words.

"Open Git Bash."

So in my mind, the entire relationship looked like this: Git - Git Bash - GitHub

Git Bash felt like the place where Git lived. Without it, I assumed I could not use Git at all.

My Workaround — And Why It Was a Problem

Around the same time I was struggling with Git Bash, I had just started learning Python and had installed Anaconda. I was using Anaconda Prompt every day for my Python work, so it was already open on my machine.

Since Git Bash was not cooperating, I took what felt like the easiest path at the time. I started uploading my work directly to GitHub.

No commits. No Git commands. No terminal at all. Just dragging files onto the GitHub website and clicking upload.

At the time, it seemed fine. The code was online, after all. My work was saved. What was the problem?

The problem was that this is not how developers actually work. And the more tutorials I watched, the more obvious that became.

Everyone was using commands like git add, git commit, and git push. They had a history of changes. They could go back to previous versions. They could work in teams without overwriting each other's work.

I was just throwing files at a website and hoping for the best.

Direct uploading skips the entire point of Git. You get no version history, no commit messages, no record of what changed or why.

The Realization That Changed Everything

Since Git Bash was broken, my first question was a practical one: is Git even installed properly on this machine? I needed to check. So, I opened CMD first, the basic Windows terminal that has always been there and typed:

git --version

It worked. Git version 2.52.0. No errors. No red text.

Then I tried the same thing in Anaconda Prompt, which I already had open for my Python work:

git --version

That one moment changed how I understood everything.
Git commands do not belong to Git Bash. They belong to Git itself. Git Bash is just one of many places where those commands can run.

Once Git is installed on your computer, any terminal can use it. Not just Git Bash. CMD works. PowerShell works. Anaconda Prompt works. The terminal inside Visual Studio Code works.

The terminal is just the interface. Git is the actual tool doing the work.

That small realization made everything less intimidating. I stopped worrying about which terminal to use and started focusing on learning Git itself.

Understanding the Terms

1. What is Git?

Git is a version control system. It keeps a record of every change you make to your files so you can go back to any previous version at any time.

For example:
You are working on an assignment, everything is going well. Then you try something new and suddenly everything breaks. You wish you could go back to the version that was working. But you cannot, because you saved over it.

We've all done something like this:

myassignment.py
myassignment_v2.py
myassignment_FINAL.py
myassignment_FINAL_v2.py
myassignment_FINAL_ACTUALLY_FINAL.py

Git solves this problem elegantly.

Key Benefits of Git

Benefit	What It Means For You
Version History	Every change is recorded. Go back anytime.
Safety Net	Restore the working version instantly.
Teamwork	Multiple people can work on the same project seamlessly.
Accountability	See what changed, when, and who did it.
Professional Standard	Every tech company in the world uses Git.

2. What is GitHub?

This was my biggest confusion for the longest time. I kept hearing Git and GitHub used together and assumed they were the same thing.

They are not.

Git tracks changes on your computer. GitHub stores your code online. Git was created in 2005. GitHub was built later in 2008 as a platform for hosting Git repositories.

When I uploaded files directly to GitHub without Git, I had storage without version control—like backing up a Word document without using track changes.

Git vs GitHub

Feature	Git	GitHub
What it is	Software on your computer	A website on the internet
Where it lives	Your local machine	Online cloud storage
Main purpose	Track code changes	Store and share code
Needs internet	No	Yes
Free to use	Yes	Yes (with limits)

Why GitHub Matters
Your GitHub profile is your professional portfolio. Employers often look at GitHub before your CV. Every commit tells a story of your progress. Treat it like your digital CV from day one. It shows them real work, real code, and real progress over time.

Every assignment you push is proof of your skills. Every commit tells a story of how you improved. Start treating your GitHub like your digital CV from day one.

3. What is Git Bash?

Git Bash is simply a terminal, a window where you type commands, that comes bundled with Git when you install it on Windows.

Windows already has terminals like CMD and PowerShell. Git Bash exists because Windows doesn’t natively support Linux-style commands, which are standard in most professional environments.

What Makes Git Bash Useful

Feature	Git Bash	Regular CMD
Shows current branch name	Yes — in the prompt itself	No
Coloured output	Yes — green, red, yellow	Plain white text only
Linux commands (ls, rm, touch)	Yes	No — uses dir, del instead
Tab auto-complete	Yes	Limited
Optimised for Git	Yes	Works but basic

Getting Git Bash
Git Bash is not a separate download. It comes automatically when you install Git:

Go to git-scm.com
Download Git for Windows
Install using the default settings
Git Bash appears in your Start Menu automatically

The Part Nobody Told Me: Any Terminal Works!

This is the core insight of this entire article.

Once Git is installed on your computer, you can run Git commands from any terminal. Not just Git Bash.

git init git add . git commit -m "first commit" git push

These commands work in Git Bash. They work in CMD. They work in PowerShell. They work in Anaconda Prompt. They work in the terminal inside Visual Studio Code.

Using Anaconda Prompt for Git
This is the option that saved me when Git Bash was broken, and it is what I genuinely recommend for data science beginners like me.

If you already have Anaconda installed for Python and Jupyter, you already have a terminal that works perfectly for Git. You do not need to install or fix anything extra.

You can manage Python, run Jupyter notebooks, and push to GitHub all from the same Anaconda Prompt window. One tool. Less confusion.

Terminal Comparison

Terminal	Git Support	Best For	Recommended?
Git Bash	Full + extra features	Git-focused work	Yes, once working correctly
Anaconda Prompt	Full support	Data science + Git	Yes — great for beginners
CMD	Full support	Basic Git operations	Yes — works fine
PowerShell	Full support	Advanced Windows tasks	Yes — works fine
VS Code Terminal	Full support	All-in-one development	Yes — very powerful

The Git Commands You Actually Need

Git has many commands, but you only need around ten for everyday work. Here they are in plain English.

First-Time Setup — Do This Once Per Computer
Before using Git, tell it who you are:

git config --global user.name "Your Name" git config --global user.email "your@email.com" git config --list

Press Enter with each command even if you see nothing showing. This means it is working

The last command confirms it worked. You should see your name and email displayed.

The Essential Commands

Command	What It Does	Real Life Equivalent
git init	Starts Git tracking in a folder	Hiring a security camera for your folder
git status	Shows what changed	Checking your to-do list
git add .	Prepares all files for saving	Putting everything in an envelope
git commit -m "msg"	Saves a version with a label	Sealing the envelope and labelling it
git push	Sends code to GitHub	Handing the envelope to a courier
git pull	Gets latest code from GitHub	Receiving a delivery
git clone	Downloads a full project	Borrowing a book from a library
git log	Shows history of all commits	Reading your diary from the beginning

The Complete Step-by-Step Workflow: First-Time Push to GitHub

Open your terminal (Git Bash, Anaconda Prompt, VS Code Terminal, etc.) and navigate to your project folder:

**Step 1 — Navigate to your project folder

cd "C:\Users\YourName\Documents\your_project"

The cd command means Change Directory. If your folder name has spaces, wrap the path in quotes.

Step 2 — Start Git tracking

git init

You will see: Initialized empty Git repository.

This creates a local Git repository in your folder, allowing Git to track changes.

Git is now watching/tracking this folder.

Step 3 — Add your files

git add .

The dot means add everything.
To add one specific file, replace the dot with the filename.

Step 4 — Commit your changes

git commit -m "first trial"

This is the actual save. The message in quotes describes what you did. Make it meaningful.

Step 5 — Create a repository on GitHub

Note: Do not initialize it with a README — this avoids merge conflicts.

Go to github.com and log in
Click the + button in the top right corner
Click New Repository
Give it a name with no spaces — use hyphens instead
Click Create Repository

Step 6 — Connect your computer to GitHub.

git branch -M main
Click Enter

Many tutorials and GitHub now use main as the default branch name.

git remote add origin https://USERNAME:TOKEN@github.com/USERNAME/reponame.git

Click enter

Replace:
USERNAME → your GitHub username
TOKEN → your personal access token
reponame → your repository name

For example:
git remote add origin https://Isika01:hgwghpskskjAKJYVE31t7Ia@github.com/Isika-01/Revision_Assignment.git

_Note: _GitHub no longer allows passwords for authentication. You must use a personal access token.

Step 7 — Push your code to GitHub

git push -u origin main

If it works, you will see objects being written and a success message.

Then check GitHub — your files will be there.

Every Update After That — Just Three Commands
After the first-time setup, every future update is only three commands. That is all.

git add . git commit -m "describe what you changed" git push

GitHub Personal Access Tokens — The Part Nobody Explains Well
I want to be upfront about this one too. When I first tried to push to GitHub and it asked for a password, I typed my regular GitHub password and got an error. Then I spent an hour confused before discovering that GitHub had changed how authentication works.

GitHub no longer accepts your regular account password for Git operations. Instead, you need a Personal Access Token. Think of it as a special password created specifically for Git.

Creating Your Token

Go to github.com and log in
Click your profile picture in the top right corner
Click Settings
Scroll all the way down and click Developer Settings
Click Personal Access Tokens then Tokens classic
Click Generate new token classic
Set the expiration to No expiration
Check the repo, workflow and read:org boxes
Click Generate Token
COPY IT IMMEDIATELY — GitHub will never show it again
Save it in a Notepad file somewhere private on your computer

Token Safety Rules

Save token privately in Notepad. Do not share it with anyone
It acts as your Git password so do not post it online
If lost, delete and regenerate it

Common Git Errors and What They Actually Mean
Errors in Git are normal. Even experienced developers get them regularly. Here are the ones you are most likely to see:

Error	What It Means	How to Fix It
remote origin already exists	You already connected to GitHub	Run `git remote remove origin` then add it again
fatal: not a git repository	Git is not tracking this folder	Run `git init` first
Authentication failed	Wrong username or token	Check your token is correct
Could not read from remote	Using SSH format instead of HTTPS	Use the `https://` URL format
nothing to commit	No files have been changed	Edit your files first then add and commit

Best Practices for Professional Git Use

Write Meaningful Commit Messages
Your commit messages are a permanent record of your work. Future employers will read them. Make them count.

Good Messages	Bad Messages
Completed SQL assignment with JOIN queries	stuff
Fixed error in data cleaning function	changes
Added visualisation charts for sales report	update
Updated README with setup instructions	aaa
Resolved login bug reported in code review	final

Repository Naming Rules

Use hyphens or underscores instead of spaces
Be descriptive so you remember what the project is
Use lowercase letters for consistency

Good Repository Names	Bad Repository Names
sql-assignment-week1	Python Data Analysis
powerbi-sales-dashboard	(See above: spaces make it bad)

Keep Your GitHub Active

Your GitHub profile shows your commit activity over time. Even small daily commits build a strong profile. Treat every assignment as an opportunity to add to your portfolio.
Employers look at how consistently you push code.

Sometimes the broken path teaches you more than the smooth one.

Mastering SQL Joins and Window Functions with Real Examples

Isika Millicent — Mon, 09 Mar 2026 11:11:43 +0000

If you have been writing SQL for a while, you have probably come across joins and window functions. These two features appear in a large percentage of real-world SQL queries, and for good reason. They are among the most powerful tools SQL has to offer.

However, they can be confusing at first. Understanding how tables connect with joins and how window functions analyze data across rows is a key step in becoming comfortable with SQL.

In this article, we will walk through the most common types of joins using a simple example and explain how they work.

1. What are Joins?

In SQL, joins are used to combine rows from two or more tables based on a related column between them.
Think of it as sliding two spreadsheets together so that matching values line up, the join type determines what you do with rows that have no partner on the other side.

To understand joins better, we will use the following two tables.

In the Customers table, customer_id is the primary key.
In the Orders table, order_id is the primary key, while customer_id acts as a foreign key that references the Customers table.

Table 1: Customers Table

| customer_id | customer_name |
|------------|---------------|
| 1          | Alice         |
| 2          | Bob           |
| 3          | Charlie       |
| 4          | Diana         |

Table 2: Orders Table

| order_id | customer_id | total_amount |
|----------|------------|--------------|
| 101      | 1          | 250          |
| 102      | 2          | 150          |
| 103      | 1          | 300          |
| 104      | 3          | 200          |
| 105      | 5          | 100          |

NB:

Customer 4 (Diana) has no orders
Order 105 belongs to customer_id 5, which does not exist in Customers.

A) INNER Join

Returns rows where there is a match in both tables.
If there’s no match, the result set will not include those records.

SELECT c.customer_name, o.order_id, o.total_amount FROM Customers c INNER JOIN Orders o ON c.customer_id = o.customer_id;

Result: The above query returns only customers who actually have orders.

customer_name   order_id    total_amount
Alice             101          250
Alice             103          300
Bob               102          150
Charlie           104          200

NB
Notice that Diana (no orders) and Order 105 (no matching customer) are both absent. Only rows with a valid match on both sides make it through.

This is the most common join.
It is mostly when you want records that exist in both tables eg orders with valid customers, students with enrolled courses.

B. LEFT(OUTER) Join

This join returns all records from the left table and the matched records from the right table.
If there’s no match, NULL values are returned for the right table’s columns.

In this case, the left table is the customers table and on the right is the orders table.

SELECT c.customer_name, o.order_id, o.total_amount FROM Customers c LEFT JOIN Orders o ON c.customer_id = o.customer_id;

Result: The above query returns all customers even those with no orders.

customer_name   order_id   total_amount
Alice              101        250
Alice              103        300
Bob                102        150
Charlie            104        200
Diana              NULL       NULL

Diana now appears in the results, even though she has no orders — her order columns simply return NULL.
Left Join is the foundation of the anti-join pattern — filtering to WHERE o.order_id IS NULL gives you only the customers who have never placed an order.

B.i) Anti Join
An Anti Join is used when you want to find records in one table that do not have a corresponding match in another table.

For Example:

Find customers who haven’t purchased anything
Identify products with no sales

Using the above table, LEFT JOIN + WHERE NULL
SELECT c.customer_id, c.customer_name FROM Customers c LEFT JOIN Orders o ON c.customer_id = o.customer_id WHERE o.order_id IS NULL;

Result: For customers without matching orders, order_idreturns NULL.
The WHERE o.order_id IS NULL condition filters these unmatched rows.
This is the anti join pattern.

customer_id    customer_name
4                 Diana

C) RIGHT Join

Returns all rows from the right table and matching rows from the left table.

SELECT c.customer_name, o.order_id, o.total_amount FROM Customers c RIGHT JOIN Orders o ON c.customer_id = o.customer_id;

Result: The above query returns all orders, even those without matching customers.

customer_name   order_id   total_amount
Alice             101         250
Alice             103         300
Bob               102         150
Charlie           104         200
NULL              105         100

d) FULL OUTER Join

Returns all rows when there is a match in one of the tables.

SELECT c.customer_name, o.order_id, o.total_amount FROM Customers c FULL OUTER JOIN Orders o ON c.customer_id = o.customer_id;

Result: Includes all customers and all orders, matching where possible. Combine both LEFT + RIGHT and everything shown

customer_name   order_id    total_amount
Alice             101           250
Alice             103           300
Bob               102           150
Charlie           104           200
Diana             NULL          NULL
NULL              105          100

e) SELF Join

A Self Join is when a table is joined with itself.

Instead of joining two different tables, you treat the same table as if it were two separate tables by using table aliases.

Self joins are commonly used when a table contains hierarchical or related data within itself, such as:

employees and managers
categories and subcategories

Example Table: Employees

employee_id     employee_name    manager_id
1                  Alice           NULL
2                  Bob             1
3                  Charlie         1
4                  Diana           2

Alice is the top manager (no manager).
Bob and Charlie report to Alice.
Diana reports to Bob.

SELECT e.employee_name AS employee, m.employee_name AS manager FROM employees e LEFT JOIN employees m ON e.manager_id = m.employee_id;

Result: The above query essentially asks: "For each employee, find the person whose employee_id matches their manager_id."

employee     manager
Alice         NULL
Bob           Alice
Charlie       Alice
Diana         Bob

Let me explain:

The employees table is referenced twice in the query.
e represents the employee.
m represents the manager.
We match manager_id from one instance of the table to employee_id in the other.

2. Window Functions

While joins help you combine data across tables, window functions help you analyze data across rows — without collapsing those rows into a single result the way GROUP BY does.

This is the key distinction: GROUP BY aggregates rows and destroys their individual identity. A window function performs the same kind of calculation but preserves every row, attaching the result as a new column alongside the original data.

The syntax uses OVER()

The OVER() clause is what transforms an ordinary aggregate into a window function. An empty OVER() applies the function across the entire result set.
PARTITION BY subdivides that set into groups, restarting the calculation for each one.

a) Ranking Functions (ROW_NUMBER, RANK, DENSE_RANK)

Suppose you want to rank each order by its total_amount — highest to lowest.
Three ranking functions can do this, and their difference only shows up when values tie.

SELECT o.order_id, c.customer_name, o.total_amount,
    -- Always unique; no ties are possible (1, 2, 3, 4 …)
ROW_NUMBER() OVER (ORDER BY o.total_amount DESC) AS row_num,
    -- Tied rows share a rank; the next rank skips (1, 1, 3 …)
RANK() OVER (ORDER BY o.total_amount DESC) AS rnk,
    -- Tied rows share a rank; no ranks are skipped (1, 1, 2 …)
DENSE_RANK() OVER (ORDER BY o.total_amount DESC) AS dense_rnk
FROM Orders o
INNER JOIN Customers c ON c.customer_id = o.customer_id;

Result:

order_id	customer_name	total_amount	row_num	rnk	dense_rnk
103	Alice	300	1	1	1
101	Alice	250	2	2	2
104	Charlie	200	3	3	3
102	Bob	150	4	4	4

NB:

Order 105 does not appear because it has no matching customer — the INNER JOIN filters it out.

b) Aggregates Over a Window — SUM, AVG, COUNT

One of the most powerful things window functions enable is comparing each row against an overall or group-level aggregate — without GROUP BY collapsing your data.

Here, we calculate the total revenue across all orders and each order's percentage share of that total:

SELECT o.order_id, c.customer_name, o.total_amount,    SUM(o.total_amount) OVER () AS grand_total,
FROM Orders o
INNER JOIN Customers c ON c.customer_id = o.customer_id;

Result:

order_id	customer_name	total_amount	grand_total
101	Alice	250	900
102	Bob	150	900
103	Alice	300	900
104	Charlie	200	900

Notice that grand_total is the same on every row.
The OVER() means "look across the entire result set."
Each row keeps its own identity while also having access to the overall total.

c) LAG and LEAD — Comparing Orders Across Rows

LAG looks at the previous row's value and LEAD looks at the next row's value.
Here, we can use LAG to show how much each of Alice's orders changed compared to her previous one.

SELECT o.order_id, c.customer_name, o.total_amount,
    -- We are now looking for the previous order's amount for the same customer
    -- o.total_amount is the column we want to “look back” at.
    -- the` 1 `returns how many rows back we want to look (1 row back in this case)
LAG(o.total_amount, 1) OVER (PARTITION BY o.customer_id
        ORDER BY o.order_id) AS prev_order_amount,
    -- Difference from the previous order using minus sign
 o.total_amount - LAG(o.total_amount, 1) 
       OVER (PARTITION BY o.customer_id
       ORDER BY o.order_id) AS change_from_prev

FROM Orders o
INNER JOIN Customers c ON c.customer_id = o.customer_id
ORDER BY c.customer_name, o.order_id;

Result:

order_id	customer_name	total_amount	prev_order_amount	change_from_prev
101	Alice	250	NULL	NULL
103	Alice	300	250	50
102	Bob	150	NULL	NULL
104	Charlie	200	NULL	NULL

Alice's second order (103) was 50 more than her first (101). Bob and - Charlie each only have one order, so their prev_order_amount returns NULL — there is no previous row to look back at.

d) NTILE — Bucketing Customers by Spend

NTILE(n) divides rows into n roughly equal buckets and assigns each row a bucket number. It is perfect for segmenting customers by how much they have spent.

Since Diana has no orders, we first join and aggregate, then apply NTILE to rank the customers who do have orders:

SELECT c.customer_name, SUM(o.total_amount)                               AS total_spent,
NTILE(3) OVER (ORDER BY SUM(o.total_amount) DESC) AS spend_tier,
CASE NTILE(3) OVER (ORDER BY SUM(o.total_amount) DESC)
WHEN 1 THEN 'High Spender'
WHEN 2 THEN 'Mid Spender'
ELSE 'Low Spender'
END AS segment

FROM Orders o
INNER JOIN Customers c ON c.customer_id = o.customer_id
GROUP BY c.customer_name
ORDER BY total_spent DESC;

Result:

customer_name	total_spent	spend_tier	segment
Alice	550	1	High Spender
Charlie	200	2	Mid Spender
Bob	150	3	Low Spender

*NB
What NTILE(3) does is divide the results into 3 equal groups (tiles).

Alice, with a combined spend of 550, lands in Tier 1. Bob and Charlie each have a single order and are bucketed accordingly. Diana does not appear because she has no orders in the Orders table.

In conclusion, as mentioned earlier, these two concepts require a lot of practice to master. The best way to build mastery is to work with a dataset you already have and understand or a spreadsheet you have exported and deliberately write one query using each join type and one query using each of the window functions covered here.

Connecting Power BI to SQL Databases: A Comprehensive Guide

Isika Millicent — Sun, 08 Mar 2026 20:17:52 +0000

What Is Power BI?
- What Power BI Actually Is
- Why Connect Power BI to a Database — Not Just a Spreadsheet?
- What Is a SQL Database?
Connecting Power BI to a Local PostgreSQL Database
Connecting to Aiven — A Cloud PostgreSQL Database
Loading Tables and Building the Data Model
Why SQL Skills Are Important for Power BI Analysts

Before we dive into all this, take a moment to think about your own data setup:

How many different sources does your organization use? (Excel, Google Sheets, CSV exports, CRM systems?)

Do multiple people ever overwrite each other’s work?

How long does it take to update reports or dashboards?

If you answered “more than once” or “too long” to any of these, connecting Power BI to a SQL database could save you hours or even days, every month.

1. What Is Power BI?

If you’ve ever looked at a large spreadsheet full of rows and columns, you know how difficult it can be to quickly understand what the data is actually telling you. Important insights can easily get buried in raw numbers, making it hard to see patterns, trends, or key business metrics.

Look at this Excel sheet for example and see how hard it is to interpret the data.

This challenge exists whether you’re a business owner, a student learning data analytics, or a professional working with large datasets.

This is where Power BI comes in.

Power BI is a business intelligence tool developed by Microsoft that allows users to connect to multiple data sources, transform and analyze data, and build interactive dashboards and reports.

Using visual elements such as charts, maps, tables, and KPIs, Power BI turns raw data into clear, actionable insights that are easy to understand.

While Power BI can connect to many different data sources such as Excel files and cloud services, one of the most common and powerful integrations is with SQL databases.

Many organizations store their operational data in relational databases like PostgreSQL, MySQL, or Microsoft SQL Server. These databases contain structured information such as transactions, customer records, product inventories etc.

The Role of SQL Databases in Analytics
SQL stands for Structured Query Language.
A SQL database organizes data into tables, which look similar to spreadsheets(rows and columns) but with strict rules about what each column can contain and how tables relate to each other.

You use SQL, the language, to ask questions of the database, like: "Show me all properties in Nairobi that sold for more than 10 million shillings this year."

For analytical workloads, SQL databases serve as the foundation from which BI tools like Power BI extract, transform, and visualize data.

Whether it is customer records, product catalogues, sales transactions, or inventory logs, the database acts as the main storage of business data.

Why Connect Power BI to a Database not Just a Spreadsheet?(for example, a file created in Excel)

Many businesses start by storing their data in Excel spreadsheets. For small datasets, this works fine.

However, as the volume of data grows, spreadsheets quickly become difficult to manage.

Let us consider a property management company tracking 5,000 rental units across multiple cities. Their data includes tenant records, property details, rental payments, and transaction histories. Managing all of this in a single spreadsheet would be inefficient and prone to errors.

A SQL database solves these problems in several ways

Scalability: Databases can store millions of rows of data without slowing down.
Multi-user access: Multiple users can read and update data simultaneously without overwriting each other’s work.
Structured data integrity: Data is stored in well-organized tables with defined relationships and rules. For example, every property can be linked to a valid agent ID, preventing incomplete or inconsistent records.
Automated reporting: Power BI can connect directly to the database and refresh reports automatically, ensuring dashboards always reflect the most recent data.

In the next section, we’ll walk through how to connect Power BI to a SQL database step by step.

Throughout this guide, we use Power BI Desktop — that is where all the connecting, modelling, and building happens.

2. Connecting Power BI to a Local PostgreSQL Database

"Local" simply means the database is installed on your own computer.

Step 1: Open Power BI Desktop

Launch Power BI Desktop and navigate to the Home tab.
Click Get Data.

Step 2: Select PostgreSQL as the Data Source

In the Get Data window, search for PostgreSQL database and select it.

Step 3: Enter the Server and Database Details
In the connection dialog box, enter the following information:

Server: localhost
Database: lux_sales

If PostgreSQL is installed locally, the server's name localhost refers to your own computer.

Power BI will then prompt you to select a connection mode:

Import Mode – copies the data into Power BI for faster analysis.
Direct Query Mode – queries the database in real time without storing the data in Power BI.

Most beginners use Import Mode because it provides faster performance and is easier to work with during analysis.

Step 4: Enter Credentials
Next, Power BI will ask for database authentication details:

Username
Password

These are the same credentials used to log into the PostgreSQL database.
After authentication, Power BI establishes the connection to the database.

Step 5: Select Tables in the Navigator

Once connected, the Navigator window displays the available tables in the database.

Users can preview the tables before loading them into Power BI.

Select the required tables and click Load.

The tables will now appear in Power BI’s Data View and will be available for analysis.

3. Connecting Power BI to a Cloud PostgreSQL Database (Aiven)

In many organizations, databases are hosted in the cloud instead of on local machines. One platform that provides managed PostgreSQL databases is Aiven.

With managed services like this, organizations do not need to worry about maintaining servers, backups, or updates. Instead, they can focus on analyzing their data.

A typical analytics workflow might involve several tools:

Aiven - hosting the database in the cloud
PostgreSQL - storing the structured data
DBeaver - for exploring and querying the database
Power BI - for building dashboards and visualizations

Step 1: Obtain Database Connection Details from Aiven

To connect Power BI to a cloud database, you first need the connection details provided by Aiven.
Inside the Aiven dashboard, you will find the following parameters: Host, Port, Database name, Username, Password

It would look like this:

These details are required for any tool connecting to the database, including Power BI or DBeaver.

Step 2: Connect Power BI to PostgreSQL:

Click Get Data
Select PostgreSQL database and you will get this pop up

Enter the connection details from Aiven:

Host → pg-3b04de5f-millicentisika-2259.j.aivencloud.com Port → 11547

pg-3b04de5f-millicentisika-2259.j.aivencloud.com:11547

The database you should input the database you want to work on.
In this example, the database I wanted to use for analysis was luxsales.

Choose Import mode for data connectivity.
Click OK.
Power BI will then prompt for authentication.

Step 3: Download the SSL Certificate

Most cloud databases require secure SSL connections.

You may encounter this error while trying to connect:

This occurs because Aiven enforces secure SSL connections.

Database tools such as DBeaver automatically trust the certificate, but Power BI requires manual validation.

To resolve this, download and install the SSL certificate provided by Aiven.

Step 3.a: Download the Certificate from Aiven

Go to console.aiven.io
Open your PostgreSQL service
Navigate to the Overview tab
Scroll to Connection Information
Click Download CA Certificate

This downloads a file named:

ca.pem

Step 3.b: Convert the Certificate Format

Windows requires certificates in .crt format.

Locate the downloaded file (usually in Downloads).

Rename the file from:

ca.pem to ca.crt

To do this:

Right-click the file
Select Rename
Change the extension to .crt

If Windows hides extensions:

Select Save as type → All Files
Manually type: ca.crt

Step 3.c: Install the Certificate in Windows

Double-click the ca.crt file.
Click Install Certificate.
Select Local Machine, then click Next.
Choose Place all certificates in the following store.
Click Browse.
Select Trusted Root Certification Authorities.
Click OK → Next → Finish.
You should see the confirmation message: The import was successful.

Step 3.d: Restart Power BI and Reconnect

After installing the certificate, close Power BI Desktop completely.
Reopen Power BI.

Connect again using the same PostgreSQL connection details.

Server: pg-3b04de5f-millicentisika-2259.j.aivencloud.com:11547
Database: luxsales
Mode: Import

Once the connection is successful, Power BI will display a Navigator window where you can select tables and load them for analysis.

This works whether the PostgreSQL is on local computer or on cloud service like Aiven. The certificate installation simply allows your computer to trust secure connections from Aiven’s servers. Power BI then connects directly to the cloud PostgreSQL database over the internet.

4. Loading Tables and Building the Data Model

Once connected to the database, the next step is to load the required tables and establish relationships between them.

Step 1: Select Tables in the Navigator

After connecting to PostgreSQL (local or cloud), the Navigator window displays all available tables in the database.

For this example, we will load the following tables:

customers
products
sales
inventory

Select all four tables and click Load.

Step 2: Understanding the Data Model

Once loaded, Power BI stores the tables in its data model. To view and manage relationships between tables, navigate to the Model View by clicking the model icon on the left sidebar.

What Is a Data Model?
A data model defines how tables relate to each other.
In a well-designed model:

Each table has a primary key (a unique identifier for each row)
Tables are connected through foreign keys (columns that reference primary keys in other tables)

For example:

The sales table contains a customer_id column that links to the customer_id in the customers table
The sales table contains a product_id column that links to the product_id in the products table

Step 3: Creating Relationships in Power BI

Power BI automatically detects some relationships, but you may need to create or modify them manually.

Common relationships in this model:

sales.customer_id → customers.customer_id (many-to-one)
sales.product_id → products.product_id (many-to-one)
inventory.product_id → products.product_id (one-to-one or many-to-one)

Relationship Cardinality

Each relationship has a cardinality that defines how rows in one table relate to rows in another:

One-to-Many (1:*): One customer can have many sales transactions
Many-to-One (*:1): Many sales belong to one customer
One-to-One (1:1): One product has one inventory record (less common)

Power BI displays the cardinality on the relationship line with symbols like 1 and *.

Why Relationships Matter

Properly defined relationships enable Power BI to:

Filter correctly across tables: When you select a customer, Power BI automatically shows only their sales
Aggregate accurately: Calculating total sales per customer requires the relationship between sales and customers
Avoid duplicate counts: Without relationships, metrics like total revenue might be calculated incorrectly

Star schema is the most common modeling pattern and looks like this

      Customers

         |

         |

     Sales (Fact Table - Center)

       /   \

      /     \

Products   Inventory

Why SQL Skills Are Important for Power BI Analysts

Although Power BI provides powerful visualization tools, SQL skills remain essential for analysts. SQL enables them to efficiently retrieve, manipulate, and prepare data stored in relational databases before using it in visualization tools like Power BI.

Key SQL capabilities include:
i) Retrieving Data
Instead of exporting entire datasets, analysts can use queries such as SELECT to extract only the columns and rows relevant to their analysis.

Example:
SELECT * FROM sales WHERE order_date >= '2025-01-01';

ii) Filtering Data
SQL allows analysts to filter large datasets before importing them into Power BI.
Using conditions like WHERE, analysts can narrow results based on specific criteria such as dates, locations, or product categories.

Example:
SELECT * FROM customers WHERE country = 'Kenya';

iii) Aggregating Data
Analysts often need summary insights rather than raw data.

SQL provides functions such as SUM(), COUNT(), AVG(), MIN(), and MAX() to calculate totals, averages, and other statistical measures.

When combined with GROUP BY, these functions allow analysts to summarize data by categories such as product type, department, or sales region.

Example:
SELECT product_id, SUM(amount) AS total_sales FROM sales GROUP BY product_id;

iv) Preparing Data for Visualization
SQL helps transform raw tables into structured datasets suitable for dashboards.
Examples include:
• Joining tables
• Creating calculated fields
• Building views
• Filtering unnecessary records

By performing these steps directly within the database, SQL ensures that the data being imported into dashboard tools like Power BI is already clean, organized, and ready for analysis.

This not only improves performance but also allows analysts to build more accurate and efficient dashboards.

Conclusion

Connecting Power BI to a SQL database allows analysts to move beyond static spreadsheets and work directly with structured, scalable data sources.

Whether the database is running locally or hosted in the cloud using services like Aiven, the process follows the same process: connect to the database, load the relevant tables, model the relationships, and build visual insights.

By combining SQL for data preparation and Power BI for visualization, analysts can build powerful dashboards that turn raw data into actionable business insights.

Advanced SQL Techniques Every Data Analyst Should Know

Isika Millicent — Fri, 06 Mar 2026 16:27:28 +0000

In this article we will cover:

Advanced Aggregations
Advanced Set Operations
Window Functions
CTEs and Query Structuring
Subqueries
EXISTS vs NOT EXISTS
Query Optimization

Most real-world insights come from combining multiple SQL techniques.

While a basic understanding of SQL e.g. selecting columns, filtering rows, and joining tables is essential for anyone working with data, mastering advanced SQL techniques is what truly separates a new data analyst from an expert data analyst. These advanced skills unlock deeper insights, help solve complex business problems, and ultimately enable more confident, data-driven decision-making.

If you have spent any time working with data, you are probably already familiar with the fundamentals of SQL: pulling rows from a table (SELECT), filtering results using conditions (WHERE), and perhaps joining two tables together (JOIN). That foundation is important.

But the moment you start working as a data analyst, the questions change. Suddenly you're asked things like:

Which customers made a purchase this month but haven’t returned since?
For each sales representative, what was their running total of revenue by the end of every quarter?
Which product categories fall within the top 10% of sales performance?

Simple SQL isn't enough anymore. That’s where advanced SQL techniques come in.

1. Advanced Aggregations: CASE WHEN, FILTER, and GROUPING SETS

Aggregation in SQL goes far beyond COUNT() and SUM().
Business reporting often requires conditional aggregation, multiple grouping levels, and flexible summarization.
Three tools make this much easier: CASE WHEN, FILTER & GROUPING SETS

a. CASE WHEN is used for conditional logic inside a query. It allows you to compute conditional aggregates — for example, counting only orders above a certain value, or separating revenue by customer tier within the same row:

Think of it as SQL’s version of an if/else statement.

Example: Categorize customers based on spending.

SELECT customer_name, total_spent, CASE WHEN total_spent > 1000 THEN 'High Value' WHEN total_spent > 500 THEN 'Medium Value' ELSE 'Low Value' END AS customer_category FROM customers;

Result:

customer total_spent category Alice 1200 High Value Brian 650 Medium Value John 200 Low Value

b. FILTER -is a cleaner alternative to conditional aggregation.
Instead of writing CASE WHEN, you attach a filter directly to the aggregate.

Example
SELECT COUNT(*) FILTER (WHERE region = 'Kenya') AS kenya_customers, COUNT(*) FILTER (WHERE region = 'Uganda') AS uganda_customers FROM customers;

c. GROUPING SETS - This is more advanced and extremely powerful.
It allows you to calculate multiple GROUP BY aggregations in one query.
Normally this requires three queries + UNION.

Example
region product sales Kenya Laptop 200 Kenya Phone 300 Uganda Laptop 150

Suppose you want: Sales by region, Sales by product and the Grand total
Normally you'd write 3 queries.

But with GROUPING SETS:

SELECT region, product, SUM(sales) AS total_sales FROM sales GROUP BY GROUPING SETS ((region), (product),());

Result:

region product total_sales Kenya NULL 500 Uganda NULL 150

This kind of aggregation is frequently requested by business stakeholders who want a single summary table rather than multiple separate queries.

2. Advanced Set Operations (UNION, UNION ALL, INTERSECT, EXCEPT)

Set operations combine the results of two SELECT queries.
They require:

Same number of columns
Compatible data types
Same column order

A. UNION/UNION ALL
It combines results from two queries and removes duplicates automatically

Example Tables

Table: customers_2024
customer_id name
1 Alice
2 Brian
3 John

Table: customers_2025
customer_id name
3 John
4 Mary
5 James

QUERY
SELECT name FROM customers_2024 UNION SELECT name FROM customers_2025;

RESULT
name
Alice
Brian
John
Mary
James

Think of them as working with result sets, not tables directly.
NB/ John appears in both tables
UNION removes the duplicate
UNION ALL keeps all the duplicate values and is faster

B. INTERSECT
This returns only the rows that exist in both queries

When to Use INTERSECT

Identifying returning customers
Matching common records between systems
Data validation

Example use case:
Customers who bought in both 2024 and 2025. Using the same tables:

SELECT name FROM customers_2024 INTERSECT SELECT name FROM customers_2025;

Result
name
John

C. EXCEPT
It returns rows from the first query that do NOT exist in the second query.

Think of it as: “Show me what's in A but not in B”

SELECT name FROM customers_2024 EXCEPT SELECT name FROM customers_2025;

Result
name
Alice
Brian

When to Use EXCEPT

Finding new customers
Detecting missing records
Comparing two datasets
Data auditing

Question: When would you use EXCEPT instead of LEFT JOIN?
I use EXCEPT when I want to compare two result sets directly and identify records present in one dataset but not another, especially for data validation tasks.

3. Window Functions

Window functions are one of the most powerful tools in SQL analytics.
They allow you to perform calculations across rows while still keeping every row in the result.

Example problem: You want to see each employee's sales alongside their department's total sales.

A normal GROUP BY would collapse rows.

Window functions avoid that.

The syntax centers on the OVER() clause, which defines the 'window' — the set of rows the function considers:

SELECT employee_name, department, sales_amount, SUM(sales_amount) OVER (PARTITION BY department) AS dept_total, ROUND(sales_amount * 100.0 / SUM(sales_amount) OVER (PARTITION BY department), 2) AS pct_of_dept FROM employee_sales;

Here, PARTITION BY department tells SQL to calculate the sum separately for each department.
Every row for the Marketing team gets Marketing's total; every row for Engineering gets Engineering's total. No rows are removed; no separate subquery is needed.

Window functions become even more powerful with ranking.

The ROW_NUMBER (), RANK (), and DENSE_RANK () functions let you rank records within groups.

ROW_NUMBER() - gives every row a unique number. Even if two rows have the same value, they still get different numbers.
RANK() - gives the same rank to tied values, but it skips numbers after ties.
DENSE_RANK() - also gives the same rank for ties, but does NOT skip numbers.

Example table: employees
name salary Alice 9000 Bob 8000 Carol 8000 David 7000

Query 1: Row_number()
SELECT name, salary, ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num FROM employees;

Result
name salary row_num Alice 9000 1 Bob 8000 2 Carol 8000 3 David 7000 4

NB/
Bob and Carol have the same salary, but they still get different row numbers
So, ROW_NUMBER() forces each row to be unique.

Query 2: RANK()
SELECT name, salary, RANK() OVER (ORDER BY salary DESC) AS rank_num FROM employees;

Result
name salary rank_num Alice 9000 1 Bob 8000 2 Carol 8000 2 David 7000 4

NB/
Bob and Carol tie at rank 2,then SQL skips rank 3 and goes to 4.

Query 3: DENSE_RANK()
SELECT name, salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank FROM employees;

Result
name salary dense_rank Alice 9000 1 Bob 8000 2 Carol 8000 2 David 7000 3

NB/ No number is skipped.

This is just one example of the power of window functions. They can be used for other analytical tasks, such as:

Ranking: NTILE () to find the top N products by sales in each category.
Lead/Lag Analysis: LEAD (), LAG () to compare a value with the value from a subsequent or preceding row, for example, to calculate month-over-month growth.
Example
SELECT month, revenue, revenue - LAG(revenue) OVER (ORDER BY month) AS growth FROM monthly_sales;
Running Totals: SUM() with an appropriate frame clause to calculate cumulative sales over time.

A classic business use case is identifying the top-performing product in each category

SELECT * FROM (SELECT product_name, category, revenue, RANK() OVER (PARTITION BY category ORDER BY revenue DESC) AS rnk FROM product_sales) WHERE rnk = 1;

4. Common Table Expressions (CTEs)

A CTE is a temporary, named result set that you can reference within a subsequent SELECT, INSERT, UPDATE, or DELETE statement.

Think of it as creating a temporary, virtual table that exists only for the duration of your query.

CTEs provide a way to break down a complex query into logical, readable steps. By giving a name to each step of your analysis, you can make your code self-documenting and easier to debug.

Understanding the Syntax
You start with the WITH keyword, followed by the name of the CTE, and then the AS keyword with the query that defines the CTE enclosed in parentheses.

Business Case Scenario: Analyzing Customer Order Data

Let's say you are an analyst for an online retailer, and you have been tasked with identifying the top 5 customers by total spending in the "Kenyan" region for the year 2025.

Attempting to do this with nested subqueries would result in a hard-to-read query. Instead, we can use CTEs to break down the problem into clear, logical steps.

Assume you have the following tables:

Table 1: customers
`customer_id customer_name region

John Smith Kenya
Jane Doe Uganda`

Table 2: orders
order_id customer_id order_date order_total 101 1 2025-01-15 250.00 102 2 2025-02-20 150.75

Here is how you can solve this problem using CTEs:

WITH orders_2025 AS (
    --Step 1: Filter orders from the year 2025--
    SELECT order_id, customer_id, order_total
    FROM orders
    WHERE EXTRACT(YEAR FROM order_date) = 2025),

kenya_customer_orders AS (
    --Step 2 & 3: Join with customers and filter for Kenyan region--
    SELECT c.customer_id, c.customer_name, o.order_total
    FROM orders_2025 AS o
    JOIN customers AS c ON o.customer_id = c.customer_id
    WHERE c.region = 'Kenya'),

customer_total_spending AS (
     --Step 4: Calculate total spending for each customer--
    SELECT customer_id, customer_name, 
    SUM(order_total) AS total_spending
    FROM kenya_customer_orders
    GROUP BY customer_id, customer_name)
 _Step 5: Rank customers and select the top 5_
SELECT customer_name, total_spending
FROM customer_total_spending
ORDER BY total_spending DESC
LIMIT 5;

As you can see, the query is much more readable and self-explanatory. Each CTE has a clear purpose, and the final SELECT statement is simple and easy to understand. If you needed to debug this query, you could easily test each CTE independently to verify its output

CTEs also support recursion, which is useful for hierarchical data structures like org charts, product categories with parent-child relationships, or network graphs.

Example Table
CEO
├─ Alice
│ ├─ Carol
│ └─ David
└─ Bob
└─ Emma

Question: Find all employees under a specific manager.

employee_id name manager_id
1 CEO NULL
2 Alice 1
3 Bob 1
4 Carol 2
5 David 2
6 Emma 3

WITH RECURSIVE employee_tree AS (
    -- start from manager
SELECT employee_id, name, manager_id
FROM employees
WHERE employee_id = 1
    UNION ALL
    -- find subordinates
SELECT e.employee_id, e.name, e.manager_id
FROM employees e
JOIN employee_tree et
ON e.manager_id = et.employee_id)

SELECT * FROM employee_tree;

Example: Sales Analytics Query

Imagine this table:

order_id customer region order_total order_date 1 Alice Kenya 500 2025-01-05 2 Brian Kenya 700 2025-02-02 3 James Uganda 300 2025-02-15 4 Alice Kenya 400 2025-03-01

Goal:
Total spending per customer
Kenyan vs non-Kenyan totals
Rank customers by spending

--Step 1 — CTE (clean and organize the orders from 2025) --

WITH customer_orders AS (
SELECT customer, region, order_total
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2025)

--Step 2 — FILTER (conditional aggregation)--

SELECT customer, SUM(order_total) AS total_spent,
SUM(order_total) FILTER (WHERE region = 'Kenya') AS kenya_spending,
SUM(order_total) FILTER (WHERE region != 'Kenya') AS international_spending

--Without FILTER, you'd need multiple CASE WHEN statements.--
--Step 3 — Window function (ranking)--
RANK() OVER (ORDER BY SUM(order_total) DESC) AS spending_rank

--This ranks customers based on their total spending.--

--Final Combined Query--

WITH customer_orders AS (
SELECT customer, region, order_total
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2025)

SELECT customer, SUM(order_total) AS total_spent,
SUM(order_total) FILTER (WHERE region = 'Kenya') AS kenya_spending,
SUM(order_total) FILTER (WHERE region != 'Kenya') AS international_spending,
RANK() OVER (ORDER BY SUM(order_total) DESC) AS spending_rank

FROM customer_orders
GROUP BY customer;

Instead of writing 5 queries, you get everything in one query.

Mental model - The flow usually looks like this:
CTE ↓ Aggregate with FILTER ↓ Analyze with WINDOW FUNCTIONS ↓ Final output

5. Subquery

This is simply a query inside another query.

It’s like saying: First calculate this… then use that result to answer the main question.
They can be used in:

SELECT

FROM

WHERE

HAVING

CORRELATED SUBQUERIES

a. Subquery in the WHERE Clause (Most Common)
Used for filtering.

Example:
Find employees who earn more than the average salary.

SELECT name, salary FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);

So first, Inner query calculates average salary. Outer query filters employees above that value.

b. Subquery in the FROM Clause (Derived Table)
Here, the subquery acts like a temporary table.

Example:

Find departments with average salary above 50,000.

SELECT department_id, avg_salary FROM ( SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY department_id) AS dept_avg WHERE avg_salary > 50000;

c. Subquery in the SELECT Clause
Used to return a calculated value per row.

Example:
Show each employee with the company average salary.
SELECT name, salary, (SELECT AVG(salary) FROM employees) AS company_avg FROM employees;

*d. Subquery in the HAVING Clause *
It is used when you want to filter aggregated results using another query.

WHERE filters rows before aggregation
HAVING filters groups after aggregation

So a subquery in HAVING usually compares one group's aggregate with another aggregate value.

SELECT department_id, AVG(salary) AS dept_avg FROM employees GROUP BY department_id HAVING AVG(salary) > ( SELECT AVG(salary) FROM employees);

Real Business Case: Find products whose total sales are greater than the average product sales.

SELECT product_id, SUM(sales) AS total_sales FROM orders GROUP BY product_id HAVING SUM(sales) > (SELECT AVG(product_sales) FROM(SELECT SUM(sales) AS product_sales FROM orders GROUP BY product_id) t);

e. Correlated Subqueries
A correlated subquery depends on the outer query. It references a column from the outer query and runs once per row.

Example:
Find employees who earn more than their department average.

SELECT e1.name, e1.salary, e1.department_id FROM employees e1 WHERE salary > ( SELECT AVG(e2.salary) FROM employees e2 WHERE e2.department_id = e1.department_id );

6. Finding Relationships with EXISTS and NOT EXISTS

EXISTS checks if at least one row exists. It ignores the actual column values.

Use the 2 when:

Checking relationships
Finding missing records
Data validation
Complex filtering
Working with large datasets
Avoiding NULL issues

Example
Table 1: customers
customer_id customer_name 1 Alice 2 Bob 3 Carol 4 David

Table 2: orders

order_id customer_id amount 101 1 500 102 1 300 103 2 700

Now let’s use them.
i. EXISTS
Find customers who have placed at least one order.

SELECT c.customer_id, c.customer_name FROM customers c WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);

Result:
customer_id customer_name 1 Alice 2 Bob

ii. NOT EXISTS
Now let’s find customers who NEVER placed an order.

SELECT c.customer_id, c.customer_name FROM customers c WHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);

NOTE BETTER:

The select 1 in both just stands in as a placeholder, you can replace it with any value. It does NOT read the column
NOT EXISTS (recommended) OR LEFT JOIN + IS NULL - give same result

7. Query Optimization

Writing SQL that produces the correct answer is one thing.
Writing SQL that runs efficiently on millions of rows is another skill entirely.

Query optimization is the process of writing SQL in a way that runs faster, uses less memory, and scales better on large data.

Key techniques:

If you frequently filter or join on a column, it’s likely a good candidate.
Example:

SELECT * FROM orders o JOIN customers c ON o.customer_id = c.customer_id;

customer_id should likely be indexed.

How Databases Actually Work
When you run a query:

SQL parses it (checks syntax).
The query optimizer analyzes it.
It chooses the best execution plan.
Then it runs it.

The optimizer decides things like:

Should it use an index?
Should it scan the whole table?
Which join order is best?
Should it use a hash join or nested loop?

You don’t see this — but it’s happening.

Key principles include:

Use EXPLAIN or EXPLAIN ANALYZE This reveals how the database executes your query.

2. Use indexes on frequently filtered columns
Indexes are like a book’s table of contents. Without index, database scans every row (slow) but with index, database jumps directly to matching rows (fast).

3. Avoid SELECT * in production queries
Just select necessary columns because less data transferred and less memory used hence faster execution

4. Filter early using WHERE clauses
Filtering early reduces the number of rows processed.

Avoid functions on indexed columns

Example:
Works but not as good:
WHERE YEAR(order_date) = 2024

Better:
WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01'

NB/ Indexes are NOT helpful for:

Small tables
Queries returning most of the table
Columns with very low uniqueness (like gender: M/F)
Heavy writes (because indexes slow down inserts/updates)

Discussion

What SQL concept took you the longest to understand?

Window functions?
CTEs?
Subqueries?
Something else?

Let me know.

Why Data Modeling Matters in Power BI: A Beginner’s Guide to Schemas, Facts, and Dimensions

Isika Millicent — Sun, 01 Feb 2026 22:53:11 +0000

Power BI Is Not About Charts - It’s About Data Modeling

When people first open Power BI, they often focus on charts, dashboards, and visuals. That’s understandable, visuals are what you see first.
But the real power of Power BI doesn’t come from visuals. It comes from how your data is structured behind the scenes.

Power BI is first and foremost a data modeling engine. Visuals are only the surface layer.

What Is Data Modeling in Power BI?

Data modeling is the process of organizing your data so Power BI can understand it efficiently.

A good data model gives Power BI a logical map of your data. It tells the engine: “These are numbers. These are descriptions. This is how they connect.”

Think of data modeling as the architectural blueprint of a building. If the foundation is weak, the building may look fine at first, but it won’t last. The same is true for Power BI.

Essential foundations of data modeling in simple terms, focusing on:

A. Star schema & Snowflake schema

What Is a Schema?
A schema is the structure or blueprint of how data is organized in a system. It answers one core question: How should Power BI understand this data?

The schema is the logic behind your report. It’s the map that shows:

What tables exist
What columns are in each table
How tables connect to each other
How data is arranged

A schema does not hold the data itself. It defines how the data is organized.

Instead of one giant table, the data is split into logical pieces. This structure allows Power BI to work efficiently and correctly.

i) Star schema
When fact and dimension tables connect correctly, they form a star-shaped structure.

This is called a star schema. The schema is literally the shape of your model.

The fact table sits in the center. Dimension tables surround it. This layout is simple, clean, and optimized for analytics. It is the preferred structure for most Power BI models because it improves performance and makes reporting more predictable.

Each dimension table contains descriptive data used for filtering, grouping, and slicing (e.g., product names, customer regions).

ii) Snowflake Schema

A Snowflake Schema is like a star schema but with dimension tables normalized — i.e., broken into several linked tables.

For example:

Product Dimension → Product → Product Category → Product Brand
Location Dimension → City → Region → Country

Instead of one dimension table, dimensions are split into multiple linked tables. While this reduces redundancy, it introduces additional complexity. Snowflake schemas are still valid — they are simply more structured and less straightforward than star schemas.

NOTE BETTER
Power BI performs best with star schemas because

Fast Query Performance – Fewer joins improve speed because Power BI’s engine runs aggregation queries efficiently.
Simpler DAX & Reporting – Clear structure makes writing measures and visuals straightforward.
Easy for Users to Understand – Business users and analysts can use models intuitively.

Why Schemas Matter
Without a schema:

Power BI guesses relationships
totals become unreliable
filtering breaks
performance slows
reports become confusing

With a good schema:

numbers aggregate correctly
filters behave logically
dashboards load faster
reports scale cleanly

Real-World Example of a Schema
Imagine a retail company that sells products online. Management wants to analyze sales by customer, product, and time.

At first, the company stores everything in one giant spreadsheet:

This works when the business is small. But as sales grow into millions of rows, problems appear:

repeated customer names
repeated product descriptions
slow performance
inconsistent calculations
hard-to-maintain reports

B. Fact and dimension tables

Every strong Power BI model follows one fundamental principle: Separate numbers from descriptions.

1) Fact Table:
Holds numeric/ measurable metrics (e.g., sales, revenue, quantity, cost, transactions etc).
Fact tables tend to be large because businesses generate many events. They answer one question: What happened?

2) Dimension Tables:
Contain descriptive attributes (e.g., Product Name, Customer City, customer names, product categories, regions, dates etc).
Are usually smaller and denormalized (not split into highly normalized tables).

They provide context and answer:_ Who? What? Where? When?_

Dimension tables drive filtering and grouping in reports.

Examples:
Customers → name, city
Products → product name, category
Dates → year, month, day

These dimension tables connect to the fact table through IDs.

C. Relationships

Relationships define how tables connect and how filters propagate between them.

Key Concepts

Cardinality: a) One-to-Many (1:*): Most common and preferred (one dimension value → many facts). b) One-to-One (1:1): Rare; usually when two tables have exactly matching rows. c) Many-to-Many (:): Possible,but can cause incorrect aggregations and slow performance.

NOTE BETTER:
Active vs. Inactive Relationships:
Power BI allows multiple relationships between tables, but only one can be active by default. Inactive ones can be used with DAX functions.

Filter Direction:

Usually single-direction — dimension filters fact data.
Bi-directional can be used sparingly, but increases complexity and processing.

D. Why modeling is critical for performance and accurate reporting

Why Good Modeling Is Critical

A poor model can cause:

Messy Formulas
Duplicated counts
Slow dashboards
Broken filters
Confusing visuals
Unreliable insights

Therefore, before building visuals, always:

Identify the fact table (numbers/events)
Identify dimension tables (descriptions)
Create clean one-to-many relationships
Remove duplicate or unnecessary tables

Model first. Visualize second

If you remember only one rule, remember this:
Separate numbers from descriptions and connect them in a clean star schema.

That single principle solves most beginner problems in Power BI. It prevents broken filters, incorrect totals, and slow reports. It turns Power BI from a charting tool into a reliable analytics engine.

That’s where real Power BI begins.

How MS Excel Can Be Used for Basic Data Analysis – Beginner-Friendly Guide

Isika Millicent — Sun, 25 Jan 2026 19:47:30 +0000

Imagine Excel as a super-smart notebook. You can type numbers, words, or dates in it, but it can also do math, spot patterns, and summarize information for you. That’s what “data analysis” is all about: making sense of data.

Here’s how it can be used for basic data analysis:

1. Organize Data

In Excel, each piece of information goes into a cell (intersection between rows and columns).
Use columns for categories (like “Name,” “Age,” or “Sales”) and rows for individual entries (like each person or each sale).

2. Sort and Filter

Sort helps you arrange your data from smallest to largest, or alphabetically.
Filter lets you see only the data you care about (like all sales above 200).
This makes it easy to spot trends or specific entries.

3. Basic Calculations

Excel can do math automatically. You can use:

Sum (=SUM(B2:B10)) → adds up numbers in a column.
Average (=AVERAGE(B2:B10)) → finds the mean.
Min / Max (=MIN(B2:B10) or =MAX(B2:B10)) → finds the smallest or largest number. Example: To get total sales from the table above, you’d type =SUM(C2:C4) → gives 650.

4. Use Basic Functions for Quick Insights

COUNT (=COUNT(A2:A10)) → counts how many entries there are.
COUNTIF (=COUNTIF(B2:B10, ">25")) → counts entries meeting a condition.

These help summarize data quickly.

5. Visualize Data with Charts

Excel can turn numbers into charts to make them easier to understand:
Bar charts → compare amounts.
Line charts → show trends over time.
Pie charts → show parts of a whole.

6. Spot Trends with Conditional Formatting

You can highlight cells that meet certain conditions.
Example: Highlight all sales over 200 → Excel can color those cells automatically.
This helps you see patterns without reading every number.

Data Cleaning in Excel (Beginner-Friendly Guide)

Raw data is almost always messy. Excel helps you clean it without needing coding skills.

1.Removing Blank Cells & Rows
Blank rows can confuse Excel when analyzing data.

How to do it:

Select your data
Go to Data → Filter
Click the filter arrow → uncheck (Blanks)
Delete the blank rows

2.Fixing Inconsistent Text

Useful tools:

UPPER() → makes text all caps
LOWER() → makes text lowercase
PROPER() → capitalizes first letters
TRIM() → removes extra spaces

Example problems:
“Nairobi”, “nairobi”, “NAIROBI”
=TRIM(PROPER(A2))

3.Removing Duplicates
Duplicate data can give wrong totals and averages.

Steps:

Select your data
Go to Data → Remove Duplicates
Choose the column(s)
Click OK

4.Splitting Data into Columns

Example:
Millicent Isika → First Name | Last Name

Steps:

Select the column
Go to Data → Text to Columns
Choose Delimited
Select space or comma
Finish

Finally: Building a Simple Dashboard in Excel

A simple Excel dashboard can:

Track performance
Show trends
Support decisions

Step 1: A good dashboard starts with clean data.
Your data should:

Have clear column headers
No blank rows
One type of data per column

Step 2: Convert data into a table. This makes your dashboard dynamic.
Step 3: Create Key Metrics (KPIs)
Step 4: Create Charts & Slicers
Step 5: Arrange the Dashboard Layout

DEV Community: Isika Millicent

Git Demystified: How I Learned Git, Git Bash, GitHub, and Terminals as a Complete Beginner

Where the Confusion Started

My Workaround — And Why It Was a Problem

The Realization That Changed Everything

Understanding the Terms

1. What is Git?

2. What is GitHub?

3. What is Git Bash?

The Git Commands You Actually Need

Mastering SQL Joins and Window Functions with Real Examples

1. What are Joins?

A) INNER Join

B. LEFT(OUTER) Join

C) RIGHT Join

d) FULL OUTER Join

e) SELF Join

2. Window Functions

a) Ranking Functions (ROW_NUMBER, RANK, DENSE_RANK)

b) Aggregates Over a Window — SUM, AVG, COUNT

c) LAG and LEAD — Comparing Orders Across Rows

d) NTILE — Bucketing Customers by Spend

Connecting Power BI to SQL Databases: A Comprehensive Guide

Table of Contents

1. What Is Power BI?

2. Connecting Power BI to a Local PostgreSQL Database

3. Connecting Power BI to a Cloud PostgreSQL Database (Aiven)

4. Loading Tables and Building the Data Model

Step 1: Select Tables in the Navigator

Step 2: Understanding the Data Model

Step 3: Creating Relationships in Power BI

Why SQL Skills Are Important for Power BI Analysts

Conclusion

Advanced SQL Techniques Every Data Analyst Should Know

In this article we will cover:

1. Advanced Aggregations: CASE WHEN, FILTER, and GROUPING SETS

2. Advanced Set Operations (UNION, UNION ALL, INTERSECT, EXCEPT)

3. Window Functions

Window functions become even more powerful with ranking.

4. Common Table Expressions (CTEs)

5. Subquery

6. Finding Relationships with EXISTS and NOT EXISTS

7. Query Optimization

Discussion

Why Data Modeling Matters in Power BI: A Beginner’s Guide to Schemas, Facts, and Dimensions

Power BI Is Not About Charts - It’s About Data Modeling

What Is Data Modeling in Power BI?

Essential foundations of data modeling in simple terms, focusing on:

A. Star schema & Snowflake schema

B. Fact and dimension tables

C. Relationships

D. Why modeling is critical for performance and accurate reporting

How MS Excel Can Be Used for Basic Data Analysis – Beginner-Friendly Guide

1. Organize Data

2. Sort and Filter

3. Basic Calculations

4. Use Basic Functions for Quick Insights

5. Visualize Data with Charts

6. Spot Trends with Conditional Formatting

Data Cleaning in Excel (Beginner-Friendly Guide)

Finally: Building a Simple Dashboard in Excel