Kahuthu Muriuki

Posted on Apr 18

SQL Functions You Will Actually Use in Data Work

#beginners #database #sql

Most SQL tutorials stop at SELECT, WHERE, and GROUP BY. That covers retrieval, but it does not cover the layer of work that happens between raw data and a meaningful result. In financial data environments — transaction records, reconciliation tables, KYC logs — the real analytical work depends on functions and operators that transform, filter, combine, and rank data before it becomes useful.

This article covers six categories of SQL functionality that come up repeatedly in practice: row-level functions, date and time handling, string manipulation, joins, window functions, and set operators. Each section includes syntax and examples grounded in the kind of data you encounter in financial and operational contexts.

1. Row-Level Functions

Row-level functions operate on individual records one at a time. They do not aggregate — they transform or evaluate each row in isolation.

The most commonly used ones fall into three groups: conditional logic, null handling, and type conversion.

Conditional Logic — CASE

SELECT
  transaction_id,
  amount,
  CASE
    WHEN amount >= 100000 THEN 'High Value'
    WHEN amount >= 10000  THEN 'Mid Range'
    ELSE 'Standard'
  END AS transaction_tier
FROM transactions;

CASE evaluates each row against a set of conditions and returns the first match. It works anywhere in a query — SELECT, WHERE, ORDER BY, and inside aggregate functions.

Null Handling — COALESCE and ISNULL

SELECT
  customer_id,
  COALESCE(phone_number, email, 'No contact on file') AS contact_detail
FROM customers;

COALESCE returns the first non-null value from a list. This is useful when records have multiple optional fields and you need to surface whichever one is populated. ISNULL (SQL Server) or IFNULL (MySQL) handles the simpler two-value version.

Type Conversion — CAST and CONVERT

SELECT
  CAST(account_balance AS DECIMAL(15, 2)) AS balance,
  CAST(transaction_date AS DATE) AS txn_date
FROM accounts;

Data pulled from flat files or external systems often arrives with the wrong data type. CAST forces a column into the type you need before filtering or calculation.

2. Date and Time Functions

Date logic is one of the areas where SQL gets used most heavily in financial and operational data work. Reporting periods, transaction timestamps, ageing calculations, and SLA tracking all depend on correct date handling.

Getting the Current Date

SELECT GETDATE();        -- SQL Server: returns current date and time
SELECT CURRENT_DATE;     -- Standard SQL / PostgreSQL: returns date only
SELECT NOW();            -- MySQL / PostgreSQL: returns date and time

Extracting Parts of a Date

SELECT
  transaction_id,
  YEAR(transaction_date)  AS txn_year,
  MONTH(transaction_date) AS txn_month,
  DAY(transaction_date)   AS txn_day
FROM transactions;

In PostgreSQL, use EXTRACT:

SELECT EXTRACT(MONTH FROM transaction_date) AS txn_month
FROM transactions;

Calculating the Difference Between Dates

-- SQL Server
SELECT
  loan_id,
  DATEDIFF(DAY, disbursement_date, repayment_date) AS days_to_repay
FROM loan_records;

-- PostgreSQL
SELECT
  loan_id,
  repayment_date - disbursement_date AS days_to_repay
FROM loan_records;

Adding or Subtracting Time

-- SQL Server
SELECT DATEADD(MONTH, 3, GETDATE()) AS quarter_ahead;

-- PostgreSQL
SELECT CURRENT_DATE + INTERVAL '3 months' AS quarter_ahead;

Formatting Dates for Display

-- SQL Server
SELECT FORMAT(transaction_date, 'dd-MMM-yyyy') AS formatted_date
FROM transactions;

-- MySQL
SELECT DATE_FORMAT(transaction_date, '%d-%b-%Y') AS formatted_date
FROM transactions;

Date formatting matters when reports are consumed by non-technical audiences who expect dates in a specific regional format.

3. String Functions

String functions clean, reshape, and extract text data. In practice, this comes up constantly — member names with inconsistent casing, account numbers with leading spaces, reference codes that need to be split or concatenated.

UPPER, LOWER, and TRIM

SELECT
  UPPER(first_name)  AS first_name,
  LOWER(email)       AS email,
  TRIM(account_ref)  AS account_ref
FROM members;

TRIM removes leading and trailing spaces. LTRIM and RTRIM handle one side at a time.

LEN and SUBSTRING

SELECT
  account_number,
  LEN(account_number) AS char_count,
  SUBSTRING(account_number, 1, 4) AS account_prefix
FROM accounts;

SUBSTRING(column, start, length) extracts a portion of a string. In MySQL the function is SUBSTR. This is useful for parsing structured codes — product categories embedded in reference numbers, branch identifiers in account strings, and similar patterns.

REPLACE and CHARINDEX

-- Remove hyphens from ID numbers
SELECT REPLACE(id_number, '-', '') AS clean_id
FROM kyc_records;

-- Find position of a character
SELECT CHARINDEX('@', email) AS at_position
FROM customers;

Concatenation

-- SQL Server / MySQL
SELECT CONCAT(first_name, ' ', last_name) AS full_name
FROM members;

-- SQL Server also supports the || operator in some versions
-- PostgreSQL uses ||
SELECT first_name || ' ' || last_name AS full_name
FROM members;

LIKE for Pattern Matching

SELECT *
FROM transactions
WHERE reference_code LIKE 'TXN-%';

The % wildcard matches any sequence of characters. _ matches a single character. These are used heavily in compliance work where you are scanning transaction references or flagging records that match a particular naming pattern.

4. JOINs

JOINs combine rows from two or more tables based on a related column. Understanding which join type to use determines whether you get matched records, all records from one side, or everything from both sides.

INNER JOIN

Returns only rows where the condition is met in both tables.

SELECT
  c.customer_id,
  c.full_name,
  t.transaction_id,
  t.amount
FROM customers c
INNER JOIN transactions t ON c.customer_id = t.customer_id;

This returns customers who have at least one transaction. Customers with no transaction history do not appear.

LEFT JOIN

Returns all rows from the left table, and matching rows from the right. Where there is no match, columns from the right table return NULL.

SELECT
  c.customer_id,
  c.full_name,
  t.transaction_id,
  t.amount
FROM customers c
LEFT JOIN transactions t ON c.customer_id = t.customer_id;

Use this when you need to identify records with no match — customers who have never transacted, accounts with no KYC record, loans with no repayment entries.

RIGHT JOIN

The mirror of LEFT JOIN. Returns all rows from the right table.

SELECT
  c.customer_id,
  t.transaction_id,
  t.amount
FROM customers c
RIGHT JOIN transactions t ON c.customer_id = t.customer_id;

In practice, RIGHT JOIN is less common because you can always rewrite it as a LEFT JOIN by swapping the table order.

FULL OUTER JOIN

Returns all rows from both tables. Where there is no match on either side, NULLs fill the gaps.

SELECT
  c.customer_id,
  c.full_name,
  t.transaction_id
FROM customers c
FULL OUTER JOIN transactions t ON c.customer_id = t.customer_id;

Useful for reconciliation queries where you need to see both unmatched customers and unmatched transactions in one result set.

CROSS JOIN

Produces a Cartesian product — every row in the first table paired with every row in the second.

SELECT
  p.product_name,
  r.region_name
FROM products p
CROSS JOIN regions r;

This is not something you use for retrieval in most contexts, but it is practical for generating combinations — pairing every product with every region to pre-populate a reporting matrix, for example.

5. Window Functions

Window functions perform calculations across a set of rows that are related to the current row, without collapsing the result into a single aggregate value. The row count in your output stays the same — which is what makes window functions different from GROUP BY.

The syntax always includes OVER(), which defines the window.

ROW_NUMBER

Assigns a unique sequential number to each row within a partition.

SELECT
  customer_id,
  transaction_date,
  amount,
  ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY transaction_date DESC) AS txn_rank
FROM transactions;

Where txn_rank = 1, you have each customer's most recent transaction. This is a common pattern for pulling the latest record per entity.

RANK and DENSE_RANK

SELECT
  agent_id,
  total_collections,
  RANK()       OVER (ORDER BY total_collections DESC) AS rank_with_gaps,
  DENSE_RANK() OVER (ORDER BY total_collections DESC) AS rank_no_gaps
FROM agent_performance;

RANK leaves gaps after tied positions (1, 2, 2, 4). DENSE_RANK does not (1, 2, 2, 3). The choice depends on whether the gaps matter for how results are consumed.

SUM, AVG, and COUNT as Window Functions

SELECT
  transaction_id,
  customer_id,
  amount,
  SUM(amount) OVER (PARTITION BY customer_id) AS customer_total,
  AVG(amount) OVER (PARTITION BY customer_id) AS customer_avg
FROM transactions;

This keeps every transaction row visible while also showing the customer-level total and average alongside each record — something GROUP BY cannot do without a subquery.

Running Totals

SELECT
  transaction_date,
  amount,
  SUM(amount) OVER (ORDER BY transaction_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM transactions;

Running totals built this way are cleaner than self-joins and easier to read in a query review.

LAG and LEAD

SELECT
  transaction_date,
  amount,
  LAG(amount, 1)  OVER (ORDER BY transaction_date) AS previous_txn_amount,
  LEAD(amount, 1) OVER (ORDER BY transaction_date) AS next_txn_amount
FROM transactions;

LAG looks back at the previous row. LEAD looks ahead. Both are useful for period-over-period comparisons without needing a self-join.

6. SET Operators

SET operators combine the results of two or more SELECT statements. They operate on result sets rather than individual tables, which makes them different from JOINs.

UNION

Combines two result sets and removes duplicate rows.

SELECT customer_id, full_name FROM retail_customers
UNION
SELECT customer_id, full_name FROM business_customers;

Both SELECT statements must return the same number of columns in the same order, with compatible data types.

UNION ALL

Same as UNION but keeps duplicates. Runs faster because there is no deduplication step.

SELECT transaction_id, amount, 'Q1' AS quarter FROM transactions_q1
UNION ALL
SELECT transaction_id, amount, 'Q2' AS quarter FROM transactions_q2;

Use UNION ALL when you are certain duplicates are not an issue, or when the source tables are structured to avoid them — combining quarterly partitions into a full-year view, for example.

INTERSECT

Returns only rows that appear in both result sets.

SELECT customer_id FROM savings_accounts
INTERSECT
SELECT customer_id FROM loan_accounts;

This identifies customers who hold both products — useful for cross-sell analysis or eligibility filtering.

EXCEPT (or MINUS in Oracle/MySQL)

Returns rows from the first result set that do not appear in the second.

SELECT customer_id FROM savings_accounts
EXCEPT
SELECT customer_id FROM loan_accounts;

This surfaces savings account holders who do not have a loan — a segment you might target for a lending product campaign, or flag for a financial inclusion review.

Key Points to Take Away

These six categories cover most of the transformation work that sits between a raw database and a finished analysis. A few things worth noting as you work with them:

Row-level functions and aggregates work at different layers. CASE and COALESCE operate on each row individually. SUM and AVG collapse rows. Mixing them requires understanding whether your logic belongs in SELECT, WHERE, or HAVING.

Date functions are not portable across databases. GETDATE() is SQL Server. NOW() is MySQL/PostgreSQL. CURRENT_DATE is ANSI standard. If your queries move between environments, this is where they will break first.

Window functions do not filter rows — they add columns. If you want to use a window function result as a filter condition, wrap it in a subquery or CTE. You cannot reference a window function alias directly in a WHERE clause.

JOIN type choice affects row count. An INNER JOIN on a one-to-many relationship multiplies rows. A LEFT JOIN on a table with nulls keeps records you might not expect. Test against a small known dataset before running against production volumes.

UNION vs UNION ALL is a performance decision as much as a logic one. Deduplication has a cost. Where duplicates are structurally impossible — different source tables, different time periods — UNION ALL is the right default.

Understanding how these functions interact gives you the control to move from data retrieval into actual data work.