Collins Njeru

Posted on Apr 17

SQL IS NOT BAD - It's Running the World And Most Developers Don't Know Half of It....

"Every database, every app, every swipe of your card — SQL is there."

The language that outlived every trend since 1974

Written for developers, analysts, and data enthusiasts at every level

From zero to advanced — in one article

Last updated: 2026 | PostgreSQL-first, broadly compatible

"In an era obsessed with the newest JavaScript framework or the latest AI buzzword, SQL has done something remarkable , it has outlived every trend thrown at it since 1974."*

NoSQL was supposed to kill it.

Big Data was supposed to replace it.

Pandas, Spark, and cloud warehouses were going to make it irrelevant.

Yet here we are. SQL is baked into everything from PostgreSQL and MySQL to Snowflake, BigQuery, and DuckDB.

~~SQL is dying.~~ ← False. SQL is thriving.

If you work with data — in any capacity — SQL fluency is not optional. It's oxygen.

The Anatomy of a SQL Query
JOINs Demystified
Aggregation and GROUP BY
Window Functions — SQL's Hidden Superpower
Subqueries vs CTEs
Indexing — Why Your Queries Are Slow
Transactions and ACID
10 SQL Mistakes Hurting Your Performance
Advanced SQL Patterns
The Future of SQL

1. The Anatomy of a SQL Query

Before we run, let's walk. A SQL query has a very specific logical execution order that trips up even experienced developers.

The order you write a query:

SELECT name, COUNT(*) AS total_orders
FROM customers
JOIN orders ON customers.id = orders.customer_id
WHERE customers.country = 'Kenya'
GROUP BY name
HAVING COUNT(*) > 5
ORDER BY total_orders DESC
LIMIT 10;

The order the database executes it:

1. FROM       → Identify the tables
2. JOIN       → Combine related data
3. WHERE      → Filter rows (before grouping)
4. GROUP BY   → Collapse rows into groups
5. HAVING     → Filter groups (after aggregation)
6. SELECT     → Choose columns to return
7. ORDER BY   → Sort the result
8. LIMIT      → Cut result to N rows

Key insight: WHERE runs before GROUP BY. This means you cannot filter on an aggregated column using WHERE. That's what HAVING is for.

Nested tip: Get this order wrong, and you'll spend hours debugging a query that looks perfectly fine.

What You Should Know Before Writing Any Query

Always understand the shape of your data first
Never assume column types — check them
Use EXPLAIN before running on large tables
Limit your results while exploring

Your SQL Query Checklist

[x] Understand the execution order
[x] Know which table is the "driving" table
[x] Add a LIMIT when exploring
[ ] Run EXPLAIN ANALYZE to check the query plan
[ ] Check indexes on JOIN and WHERE columns
[ ] Peer-review complex queries before production

2. JOINs Demystified

JOINs are the heart of relational databases. If you understand them deeply, you understand SQL.

The Main JOIN Types

JOIN Type	What It Returns	Use Case
`INNER JOIN`	Rows matching in both tables	Most common join
`LEFT JOIN`	All left rows + matched right rows	Find unmatched records
`RIGHT JOIN`	All right rows + matched left rows	Rare — prefer LEFT JOIN
`FULL OUTER JOIN`	All rows from both tables	Complete data comparison
`CROSS JOIN`	Cartesian product of both tables	Combinations/permutations
`SELF JOIN`	A table joined to itself	Hierarchies, org charts

Visual Mental Model

Think of two overlapping circles:

INNER JOIN → only the overlap
LEFT JOIN → entire left circle + overlap
FULL OUTER JOIN → both circles entirely

A Common Real-World Example

-- Find customers who have NEVER placed an order
SELECT c.name, c.email
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.customer_id IS NULL;

The trick of using LEFT JOIN ... WHERE right_table.id IS NULL to find non-matching rows is one of the most elegant patterns in SQL. Learn it. Use it often.

The SELF JOIN — More Useful Than You Think

-- Find employees and their managers (from the same table)
SELECT 
    e.name AS employee,
    m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

3. Aggregation and GROUP BY

Aggregation is where SQL transforms raw rows into business intelligence.

Core Aggregate Functions

SELECT
    department,
    COUNT(*)                    AS total_staff,
    AVG(salary)                 AS avg_salary,
    MAX(salary)                 AS highest_salary,
    MIN(salary)                 AS lowest_salary,
    SUM(salary)                 AS total_payroll,
    COUNT(DISTINCT job_title)   AS unique_roles
FROM employees
GROUP BY department
ORDER BY total_payroll DESC;

The Golden Rule of GROUP BY

Every column in SELECT must either be in GROUP BY or be inside an aggregate function.

Wrong vs Right

--  WRONG — 'name' is neither grouped nor aggregated
SELECT department, name, COUNT(*)
FROM employees
GROUP BY department;

--  CORRECT
SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department;

Filtering Groups with HAVING

-- Only departments with more than 10 employees earning above $50,000
SELECT department, COUNT(*) AS headcount
FROM employees
WHERE salary > 50000          -- filters rows BEFORE grouping
GROUP BY department
HAVING COUNT(*) > 10          -- filters groups AFTER aggregation
ORDER BY headcount DESC;

4. Window Functions — SQL's Hidden Superpower

If there is one feature that separates SQL beginners from SQL professionals, it is window functions.

Window functions let you perform calculations across related rows — without collapsing the result like GROUP BY does.

The Syntax

function_name() OVER (
    PARTITION BY column     -- Optional: group rows into windows
    ORDER BY column         -- Optional: define order within the window
    ROWS BETWEEN ...        -- Optional: define the window frame
)

Ranking Functions

SELECT
    name,
    department,
    salary,
    RANK()        OVER (PARTITION BY department ORDER BY salary DESC) AS rank,
    DENSE_RANK()  OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank,
    ROW_NUMBER()  OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;

Difference Between Ranking Functions

Function	Handles Ties	Example Output
`ROW_NUMBER()`	Always unique	1, 2, 3, 4
`RANK()`	Skips after tie	1, 1, 3, 4
`DENSE_RANK()`	No skip after tie	1, 1, 2, 3

Running Totals and Moving Averages

-- Running total of sales with a 7-day moving average
SELECT
    sale_date,
    amount,
    SUM(amount) OVER (ORDER BY sale_date) AS running_total,
    AVG(amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_7day_avg
FROM sales;

LAG and LEAD — Comparing to Neighboring Rows

-- Month-over-month revenue growth
SELECT
    month,
    revenue,
    LAG(revenue, 1)  OVER (ORDER BY month) AS prev_month,
    LEAD(revenue, 1) OVER (ORDER BY month) AS next_month,
    revenue - LAG(revenue, 1) OVER (ORDER BY month) AS growth
FROM monthly_revenue;

Window functions are the reason SQL is still king for analytics. No other tool lets you express this kind of computation as cleanly.

5. Subqueries vs CTEs

The Subquery — Powerful but Messy at Scale

A query nested inside another query:

SELECT name, total_spent
FROM (
    SELECT customer_id, SUM(amount) AS total_spent
    FROM orders
    GROUP BY customer_id
) AS customer_totals
JOIN customers ON customers.id = customer_totals.customer_id
WHERE total_spent > (
    SELECT AVG(total_spent) FROM (
        SELECT SUM(amount) AS total_spent
        FROM orders
        GROUP BY customer_id
    ) AS sub
);

This works — but grows unreadable fast.

The CTE (Common Table Expression) — Clean, Named, Reusable

CTEs let you name intermediate results using the WITH keyword:

WITH customer_totals AS (
    SELECT customer_id, SUM(amount) AS total_spent
    FROM orders
    GROUP BY customer_id
),
avg_spend AS (
    SELECT AVG(total_spent) AS avg_val FROM customer_totals
)
SELECT 
    c.name, 
    ct.total_spent
FROM customers c
JOIN customer_totals ct ON c.id = ct.customer_id
CROSS JOIN avg_spend
WHERE ct.total_spent > avg_spend.avg_val
ORDER BY ct.total_spent DESC;

Prefer CTEs over nested subqueries for anything beyond one level. Your teammates — and your future self — will thank you.

Recursive CTEs — Conquering Hierarchical Data

-- Traverse an employee org chart tree
WITH RECURSIVE org_chart AS (
    -- Base case: the CEO (no manager)
    SELECT id, name, manager_id, 0 AS depth
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive case: employees joined to their managers
    SELECT e.id, e.name, e.manager_id, oc.depth + 1
    FROM employees e
    JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT depth, name FROM org_chart ORDER BY depth, name;

When to Use Each

Subquery — one-off, simple, single-level nesting
CTE — multi-step logic, readable complex queries
Recursive CTE — trees, graphs, hierarchies

6. Indexing — Why Your Queries Are Slow

No SQL article is complete without talking about performance. And no performance topic matters more than indexing.

What Is an Index?

An index is a separate data structure (usually a B-tree) that lets the database find rows without scanning every record. Think of it as the index in the back of a textbook.

The Speed Difference Is Staggering

-- Without index — scanning 50 million rows
SELECT * FROM customers WHERE email = 'alice@example.com';
-- ⏱ Time: 8.4 seconds

-- Create the index
CREATE INDEX idx_customers_email ON customers(email);

-- With index — direct lookup
SELECT * FROM customers WHERE email = 'alice@example.com';
-- ⏱ Time: 0.003 seconds

When to Create an Index

Columns used frequently in WHERE clauses
Columns used in JOIN conditions
Columns used in ORDER BY or GROUP BY on large tables
Foreign key columns

When Indexes Hurt

Columns with very low cardinality (e.g., is_active = TRUE/FALSE only)
Tables written to far more often than they're read
Too many indexes slow down INSERT, UPDATE, and DELETE

EXPLAIN — Your Best Debugging Friend

EXPLAIN ANALYZE
SELECT c.name, COUNT(o.id) AS order_count
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE c.country = 'Kenya'
GROUP BY c.name;

Use EXPLAIN ANALYZE to see whether the database does a full table scan (Seq Scan) or uses an index (Index Scan). Never optimize blindly.

7. Transactions and ACID

Every time you move money, book a seat, or submit an order — you're trusting transactions to make sure nothing breaks mid-operation.

The ACID Properties

Property	Meaning	Real-World Guarantee
Atomicity	All-or-nothing	No partial transfers
Consistency	Valid state → valid state	Rules always enforced
Isolation	Concurrent transactions don't interfere	No dirty reads
Durability	Committed data survives crashes	Your data is safe

A Bank Transfer — The Classic Example

BEGIN TRANSACTION;

-- Step 1: Deduct from sender
UPDATE accounts SET balance = balance - 1000 WHERE user_id = 'alice';

-- Step 2: Add to receiver
UPDATE accounts SET balance = balance + 1000 WHERE user_id = 'bob';

-- If both succeed:
COMMIT;

-- If anything fails:
-- ROLLBACK;

Without transactions: a server crash between Step 1 and Step 2 means Alice loses money that Bob never receives.

With transactions: either both steps happen, or neither does.

8. 10 SQL Mistakes Hurting Your Performance

SELECT * — Never in Production

Always name the columns you need. SELECT * fetches unnecessary data, breaks when schemas change, and prevents index-only scans.

The Top 10 Offenders

SELECT * in production code — fetch only what you need
Not filtering before joining — WHERE early, not after millions of rows are joined
Functions on indexed columns in WHERE

   --  Kills the index
   WHERE YEAR(created_at) = 2024

   --  Preserves the index
   WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01'

HAVING for non-aggregated filters — use WHERE before grouping
OR on indexed columns — can prevent index use; try UNION ALL
NOT IN with NULLs — returns no rows if subquery has a NULL; use NOT EXISTS
Missing indexes on foreign keys — always index FK columns used in JOINs
N+1 queries — never query inside a loop; use JOINs or batch queries
Implicit type conversions — comparing VARCHAR to integers forces a cast on every row
Forgetting LIMIT while exploring — 3 rows in dev can be 30 million in prod

9. Advanced SQL Patterns

UPSERT — Insert or Update in One Statement

-- PostgreSQL syntax
INSERT INTO users (id, email, login_count)
VALUES (42, 'alice@example.com', 1)
ON CONFLICT (id) DO UPDATE
SET login_count = users.login_count + 1,
    last_seen   = NOW();

Pivot with CASE WHEN

-- Turn rows into columns
SELECT
    product,
    SUM(CASE WHEN quarter = 'Q1' THEN revenue ELSE 0 END) AS Q1,
    SUM(CASE WHEN quarter = 'Q2' THEN revenue ELSE 0 END) AS Q2,
    SUM(CASE WHEN quarter = 'Q3' THEN revenue ELSE 0 END) AS Q3,
    SUM(CASE WHEN quarter = 'Q4' THEN revenue ELSE 0 END) AS Q4
FROM sales
GROUP BY product;

Find Duplicates

SELECT email, COUNT(*) AS occurrences
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY occurrences DESC;

Gap Detection — Finding Missing IDs

SELECT id + 1 AS gap_start
FROM orders o
WHERE NOT EXISTS (
    SELECT 1 FROM orders WHERE id = o.id + 1
)
AND id < (SELECT MAX(id) FROM orders);

Percentile Calculation

SELECT
    department,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary,
    PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY salary) AS p90_salary
FROM employees
GROUP BY department;

10. The Future of SQL

Far from fading, SQL is evolving aggressively:

DuckDB — in-process analytics SQL that runs on your laptop with serious performance
Snowflake / BigQuery / Redshift — cloud data warehouses where SQL scales to petabytes
dbt (data build tool) — SQL as software engineering, with version control and testing
SQL on JSON — modern databases now query deeply nested JSON natively
AI-assisted SQL — LLMs generate SQL, but knowing it deeply makes you 10x better at validating that output

SQL will be the last programming language to die.

And it won't.

Your SQL Learning Roadmap

Beginner

[ ] SELECT, FROM, WHERE basics
[ ] INNER JOIN and LEFT JOIN
[ ] GROUP BY and aggregate functions
[ ] ORDER BY and LIMIT

Intermediate

[ ] All JOIN types
[ ] Subqueries and CTEs
[ ] HAVING vs WHERE
[ ] NULL handling

Advanced

[ ] Window functions (RANK, LAG, LEAD, running totals)
[ ] Recursive CTEs
[ ] Indexing strategy and EXPLAIN
[ ] Transactions and isolation levels
[ ] Query optimization techniques

Conclusion

SQL is not just a tool. It's a way of thinking about data — relationally, declaratively, and powerfully.

The developers and analysts who master SQL don't just write better queries. They:

Think more clearly about data problems
Debug faster
Communicate better across engineering, product, and business teams

The language is 50+ years old and still deeply relevant. That is not an accident — it's proof.

Start writing better SQL today. The data is waiting.