Kahuthu Muriuki

Posted on Apr 14

SQL Fundamentals: DDL, DML, and Practical Data Manipulation

#analytics #data #database #sql

I have spent a good portion of my career dealing with structured data — member records at SACCOs, trade logs at a securities brokerage, and KYC verification tables on payments platforms. What all of these have in common is that they sit in relational databases, and the language you use to build, fill, query, and maintain those databases is SQL.

This week's assignment brought me back to the basics: DDL, DML, filtering, and conditional logic. Here is what I worked through and how it connects to the kind of data work I do day to day.

DDL and DML — Two Sides of the Same Coin

SQL commands fall into categories depending on what they act on. The two that matter most when you are starting out are DDL and DML.

DDL (Data Definition Language) is about structure. It deals with the skeleton of your database — tables, columns, data types, constraints. The main DDL commands are CREATE, ALTER, and DROP. When you run a DDL statement, you are changing what the database looks like, not what is stored inside it. Think of it as drawing up the blueprint for a filing cabinet before you start stuffing folders into it.

DML (Data Manipulation Language) is about the actual records. Once DDL has set up the structure, DML is how you put data in, pull data out, change it, or remove it. The core DML commands are INSERT, UPDATE, DELETE, and SELECT.

The distinction matters because mixing them up causes real problems. I have seen junior analysts on a team attempt to INSERT into a table that did not exist yet — they skipped the DDL step entirely. Structure first, data second. That order never changes.

How I Used CREATE, INSERT, UPDATE, and DELETE

CREATE

For the assignment, I started by defining the tables I would need. In my case, I set up a members table, a loan_products table, and a transactions table — modelled loosely on SACCO operations I have worked with before.

CREATE TABLE members (
    member_id INT PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    gender CHAR(1),
    date_of_birth DATE,
    branch VARCHAR(50),
    kyc_status VARCHAR(20),
    date_joined DATE
);

Every column has a defined data type, and member_id is set as the primary key so each record is uniquely identifiable. This is the same logic I followed when modelling dimension tables in Power BI — the grain of the table has to be clear from the start.

INSERT

With the table in place, I populated it with records.

INSERT INTO members (member_id, first_name, last_name, gender, date_of_birth, branch, kyc_status, date_joined)
VALUES
(1, 'Amina', 'Wanjiku', 'F', '1990-03-12', 'Westlands', 'Verified', '2019-06-15'),
(2, 'Brian', 'Ochieng', 'M', '1987-07-25', 'Mombasa CBD', 'Verified', '2018-01-10'),
(3, 'Cynthia', 'Mutua', 'F', '1992-11-05', 'Kisumu Town', 'Pending', '2021-04-22'),
(4, 'David', 'Kamau', 'M', '1985-02-18', 'Westlands', 'Verified', '2017-09-03'),
(5, 'Esther', 'Akinyi', 'F', '1995-06-30', 'Nakuru East', 'Rejected', '2022-08-11'),
(6, 'Felix', 'Otieno', 'M', '1988-09-14', 'Eldoret Central', 'Verified', '2020-02-28');

In a production environment I would be loading this from a CSV or an ETL pipeline, but the logic is the same — each INSERT statement maps values to the columns defined in the CREATE step.

UPDATE

Records change. A member moves branches, a KYC status gets resolved, a transaction amount needs correction. UPDATE handles that.

UPDATE members
SET branch = 'Kilimani', kyc_status = 'Verified'
WHERE member_id = 3;

This corrects Cynthia's branch and moves her KYC status from 'Pending' to 'Verified'. The WHERE clause is critical here — run an UPDATE without it and you overwrite every row in the table. I have seen that happen on a live SACCO database. It is not a mistake you make twice.

DELETE

Sometimes records need to go. A duplicate entry, a test row left in production, or a member account that was created in error.

DELETE FROM members
WHERE member_id = 5;

Same rule applies: always use WHERE with DELETE. Omitting it wipes the entire table clean.

Filtering with WHERE

The WHERE clause is how you narrow down results to only the rows that matter. Without it, every SELECT, UPDATE, and DELETE hits the full table.

Here are the operators I used most in the assignment:

Equality (=) — straightforward exact match.

SELECT * FROM members WHERE branch = 'Westlands';

This returns only members registered at the Westlands branch.

Greater than (>) — useful for numerical or date comparisons.

SELECT * FROM transactions WHERE amount > 50000;

In a SACCO context, this pulls transactions above KES 50,000 — the kind of threshold that triggers additional AML checks.

BETWEEN — filters within a range, inclusive on both ends.

SELECT * FROM transactions
WHERE transaction_date BETWEEN '2024-01-01' AND '2024-03-31';

I used this to isolate Q1 2024 transactions. Date range filtering comes up constantly in financial reporting.

IN — matches against a list of values.

SELECT * FROM members
WHERE branch IN ('Westlands', 'Mombasa CBD', 'Kisumu Town');

Cleaner than writing three separate OR conditions. I use IN a lot when pulling data for specific branches or account types.

LIKE — pattern matching on text fields.

SELECT * FROM members
WHERE last_name LIKE 'W%';

The % wildcard matches any sequence of characters. This finds all members whose last name starts with 'W'. The _ wildcard matches a single character if you need more precision.

All of these operators can be combined with AND and OR to build more specific filters. The key thing is that WHERE keeps your queries targeted — you get back what you need and nothing more.

Using CASE WHEN to Transform Data

Raw data is rarely presentation-ready. CASE WHEN lets you apply conditional logic inside a query, similar to nested IF statements in Excel but running directly on the database.

In the assignment, I used it to classify transaction amounts into risk tiers:

SELECT
    member_id,
    amount,
    CASE
        WHEN amount >= 100000 THEN 'High Value'
        WHEN amount >= 50000 THEN 'Medium Value'
        WHEN amount >= 10000 THEN 'Standard'
        ELSE 'Micro'
    END AS risk_tier
FROM transactions;

This adds a computed column called risk_tier to the result set without changing the underlying table. In compliance work, this kind of classification feeds directly into suspicious transaction reports — you tag records by threshold, then route them for review.

CASE WHEN evaluates top to bottom and stops at the first match, so the order of conditions matters. A transaction of KES 120,000 hits the first condition and gets labelled 'High Value' — it does not fall through to the next one.

I also used it to flag KYC status into a binary ready/not-ready indicator:

SELECT
    first_name,
    last_name,
    kyc_status,
    CASE
        WHEN kyc_status = 'Verified' THEN 'Active'
        ELSE 'Restricted'
    END AS account_status
FROM members;

This is the kind of derived column I would build into a Power BI data model as a calculated column — except here it runs at the SQL layer before the data even reaches the BI tool.

Reflection

The SQL covered this week is not new to me conceptually — I have written plenty of queries in the course of my work. But going back to the fundamentals and writing each statement out deliberately forced me to think about things I normally take for granted.

What caught my attention most was how much of my day-to-day data work maps directly to these four operations. Every member onboarding is an INSERT. Every KYC status update is an UPDATE. Every duplicate cleanup is a DELETE. And every report starts with a SELECT and a WHERE clause.

The part I found most useful was working through CASE WHEN with financial thresholds. In my compliance work, transaction classification is not optional — regulators expect it. Writing those conditions as SQL rather than handling them in Excel or Power Query is cleaner and more auditable.

One thing that tripped me up briefly was column aliasing with AS inside CASE WHEN — I initially placed the alias in the wrong position, outside the END keyword. Small syntax issue, but it threw an error that took a minute to track down. SQL is unforgiving about placement, and that is a good discipline to have when you are writing queries that run against production data.

Overall, a solid week. The fundamentals are the foundation — everything from joins to window functions to stored procedures builds on top of CREATE, INSERT, SELECT, and WHERE.

Lawrence Kahuthu Muriuki — Data & Finance Professional, Nairobi

DEV Community