DEV Community: Collins Njeru

THE ENGINEERING OF ELECTRICAL VERTICAL TAKEOFF: A deep technical study

Collins Njeru — Wed, 06 May 2026 09:22:09 +0000

Every eVTOL conceals the same brutal physical contradiction. Hovering demands enormous disc area and low disc loading to minimise induced power. Efficient cruise demands a compact, streamlined vehicle. You cannot fully satisfy both. Every design choice in an electric vertical-takeoff aircraft is a negotiation with this paradox — and understanding the negotiation is understanding the machine.

The Physics of Hover
Three Architectures, One Contradiction
Rotor Aerodynamics — The Blade-Level Story
The Battery Constraint and Electric Range
Flight Control — Why Linear Theory Breaks Down
Acoustic Engineering — The Whisper Problem
Structural Philosophy — Why Carbon Is Not Enough
The Cascade — How Everything Connects

1. The Physics of Hover

The foundation is actuator disc theory — a control-volume analysis that treats the rotor as an idealised pressure-jumping disc. Air enters from infinity above the disc at freestream velocity, accelerates through it, and exits below in a fully-developed slipstream. The momentum equation gives the thrust; the energy equation gives the minimum possible power to produce it.

1.1 Disc Loading

Define disc loading DL as thrust T per unit total disc area A_total. This single number dictates nearly every downstream performance characteristic of the vehicle.

DL = T / A_total  =  T / (n · π · R²)

Symbol	Meaning	Units
`DL`	Disc loading	N/m²
`T`	Total thrust (= weight at hover)	N
`n`	Number of rotors	—
`R`	Rotor radius	m

At hover, T = W (total vehicle weight in Newtons). The higher the disc loading, the harder each unit of disc area has to work — and the more power it consumes doing so.

1.2 Induced Velocity

The induced velocity through the disc — the velocity imparted to the airstream to generate momentum — follows directly from the actuator disc momentum equation:

v_i  =  √( T / (2ρA) )  =  √( DL / 2ρ )

Where ρ = 1.225 kg/m³ at ISA sea level. Notice: v_i scales with the square root of disc loading. Doubling DL increases induced velocity by only 41% — but as we will see, power scales more steeply.

1.3 Hover Power and Figure of Merit

Ideal hover power is thrust times induced velocity. Actual shaft power is penalised by the Figure of Merit FM (typically 0.70–0.82 for a well-designed rotor). Electrical power further accounts for motor and inverter efficiency η:

P_ideal  =  T · v_i  =  T^(3/2) / √(2ρA)         ... (Eq. 1)

P_shaft  =  P_ideal / FM                            ... (Eq. 2)

P_electric  =  P_shaft / η                          ... (Eq. 3)

The T^(3/2) dependence is the crucial insight. Rewriting in terms of disc loading:

P_ideal  =  (DL)^(1/2) · T / √(2ρ)

Halving disc loading cuts hover power by approximately 30% — a powerful driver toward large-diameter rotors, entirely independent of motor or battery technology. Physics, not engineering, sets this floor.

Key numbers — disc loading benchmarks:

Configuration DL (N/m²)

Conventional helicopter 48 – 120

Lift+cruise eVTOL (hover mode) 150 – 350

Tiltrotor eVTOL 200 – 500

Multirotor eVTOL 300 – 800

Configuration	DL (N/m²)
Conventional helicopter	48 – 120
Lift+cruise eVTOL (hover mode)	150 – 350
Tiltrotor eVTOL	200 – 500
Multirotor eVTOL	300 – 800

1.4 Python: Hover Power Solver

import numpy as np

def hover_power(W_N, n, R, FM=0.76, eta=0.93, rho=1.225):
    """
    Actuator disc hover power solver.

    Parameters
    ----------
    W_N  : vehicle weight in Newtons
    n    : number of rotors
    R    : rotor radius (m)
    FM   : figure of merit (0.70 – 0.85)
    eta  : motor + drive efficiency
    rho  : air density (kg/m³)

    Returns
    -------
    dict of disc loading, induced velocity, and power components
    """
    A       = n * np.pi * R**2          # total disc area  [m²]
    DL      = W_N / A                   # disc loading     [N/m²]
    v_i     = np.sqrt(DL / (2 * rho))   # induced velocity [m/s]
    P_ideal = W_N * v_i                 # ideal power      [W]
    P_shaft = P_ideal / FM              # shaft power      [W]
    P_elec  = P_shaft / eta             # electrical draw  [W]
    Omega   = v_i / R                   # approx hover angular velocity [rad/s]
    RPM     = Omega * 60 / (2 * np.pi)

    return {
        "DL_Nm2"      : round(DL, 1),
        "v_i_ms"      : round(v_i, 2),
        "P_ideal_kW"  : round(P_ideal / 1e3, 2),
        "P_shaft_kW"  : round(P_shaft / 1e3, 2),
        "P_elec_kW"   : round(P_elec / 1e3, 2),
        "RPM_estimate": round(RPM, 0)
    }


# Example: 800 kg, 6× rotors of 1.0 m radius
result = hover_power(800 * 9.81, n=6, R=1.0)
print(result)

# Output:
# DL = 409 N/m², v_i = 12.9 m/s,
# P_ideal = 101.6 kW, P_elec ≈ 145 kW

# Sensitivity study — rotor radius effect
print("\nRadius sensitivity (n=6, W=800 kg):")
for R in [0.8, 1.0, 1.2, 1.4, 1.6]:
    r = hover_power(800 * 9.81, n=6, R=R)
    print(f"  R={R:.1f}m → DL={r['DL_Nm2']:.0f} N/m²,"
          f" P_elec={r['P_elec_kW']:.1f} kW")

Output:

  R=0.8m → DL=639 N/m²,  P_elec=183.4 kW
  R=1.0m → DL=409 N/m²,  P_elec=144.6 kW
  R=1.2m → DL=284 N/m²,  P_elec=117.9 kW
  R=1.4m → DL=208 N/m²,  P_elec=98.8  kW
  R=1.6m → DL=159 N/m²,  P_elec=84.2  kW

Each 20% increase in rotor radius reduces electrical hover power by roughly 16% — entirely from the disc loading reduction. No change in motors, batteries, or efficiency. The argument for large-diameter rotors is purely aerodynamic.

2. Three Architectures, One Contradiction

Three structural answers have emerged to the hover-cruise paradox. They differ not in goal but in how aggressively they trade hover efficiency against cruise capability.

2.1 Architecture Schematics (ASCII Diagrams)

MULTIROTOR
─────────────────────────────────────────────────────────
     ○ ─── ╔═══╗ ─── ○
           ║   ║
     ○ ─── ╚═══╝ ─── ○
     ↑ all rotors provide   ↑
       lift AND control

  DL: 300–600 N/m²   |   Cruise L/D: 4–7
  Pros: Simple, redundant, no moving mechanisms
  Cons: Poor cruise efficiency, all rotors spinning at cruise


LIFT + CRUISE
─────────────────────────────────────────────────────────
     ○       ○
     │  ╔══════════╗  │
─────┤  ║  cabin   ║  ├─────────► [pusher/puller prop]
     │  ╚══════════╝  │
     ○       ○
     ↑ lift rotors    ↑            ↑ cruise propulsor
       (fold/stop                    (separate)
        in cruise)

  DL: 150–350 N/m²   |   Cruise L/D: 10–16
  Pros: Excellent cruise L/D, each system optimised independently
  Cons: Lift rotor deadweight at cruise, higher structural mass


TILTROTOR / TILT-WING
─────────────────────────────────────────────────────────

  HOVER mode:         CRUISE mode:
     ↑    ↑              ▶         ▶
     ○    ○              ○         ○
     │────│────╔══╗───────────────
                ║  ║
     ──────────────────────
     ○    ○              ○         ○
     ↑    ↑              ▶         ▶

  DL: 200–500 N/m²   |   Cruise L/D: 12–18
  Pros: Best cruise efficiency of all three, same rotor serves both
  Cons: Complex tilt mechanism, dynamic loads in transition corridor

2.2 The Fundamental Trade-off

Metric	Multirotor	Lift + Cruise	Tiltrotor
Hover power efficiency	Medium	High	Medium-High
Cruise aerodynamic efficiency	Low	High	Highest
Mechanical complexity	Low	Medium	High
Certification risk	Low	Medium	High
Range potential	Short	Medium	Longest
Structural mass fraction	Lowest	Highest	Medium
Transition corridor risk	None	Low	High

The multirotor accepts its hover-efficiency penalty by keeping rotor radii large relative to the fuselage and eliminating all mechanical complexity. The lift+cruise architecture completely segregates hover and cruise functions — the aerodynamic design of each system is unconstrained by the other's requirements, but the vehicle carries both systems as deadweight throughout the entire mission. The tiltrotor solves the problem mechanically, achieving the best cruise efficiency but demanding high-load tilt actuators and a carefully managed transition corridor where neither pure hover nor pure wing-borne flight applies.

3. Rotor Aerodynamics — The Blade-Level Story

3.1 Figure of Merit

FM is to a rotor what L/D is to a wing — a single-number quality indicator. A rotor hovering at FM = 0.80 is producing ideal thrust while losing 25% additional power to profile drag, tip losses, swirl, and non-uniform inflow.

FM  =  P_ideal / P_actual
    =  (T^(3/2) / √(2ρA)) / P_actual        ... (Eq. 4)

Values above 0.82 require near-optimal twist distribution, low tip Mach number, and careful planform design. Peak FM occurs near the blade's aerodynamic optimum — operating far from it degrades efficiency in both directions.

3.2 Tip Mach Number — The Hard Ceiling

Compressibility becomes significant above M_tip ≈ 0.60 and catastrophic above 0.75. Shock formation on the outer blade segment collapses local L/D, dramatically increases profile power, and is the dominant source of impulsive noise. Tip speed is therefore constrained to approximately 150–220 m/s for urban eVTOL designs.

M_tip  =  (Ω · R  +  V_fwd · cos α)  /  c_∞        ... (Eq. 5)

Where Ω is rotor angular velocity, V_fwd is forward flight speed, α is the rotor disc tilt angle, and c_∞ = 343 m/s at standard conditions.

In forward flight, the advancing blade tip sees freestream velocity added to its rotational tip speed. At 180 m/s tip speed and 80 m/s cruise, the advancing blade Mach reaches 0.76 — near the structural limit. This drives tiltrotor designs toward aggressively reduced tip speeds during transition.

3.3 Blade Geometry Parameters

        LE sweep  TE taper
           ↙         ↘
    ┌──────────────────────┐   ← Tip: low chord, high sweep
    │░░░░░░░░░░░░░░░░░░░░░░│
    │░░░░░░░░░░░░░░░░░░░░░░│   ← Mid: linear twist distribution
    │░░░░░░░░░░░░░░░░░░░░░░│
    └──────────────────────┘   ← Root: high chord, zero twist
         │            │
      Hub cutout   Maximum chord

  Typical parameters (urban eVTOL blade):
  ─────────────────────────────────────────
  Solidity σ    : 0.05 – 0.12  (rotor thrust coefficient capacity)
  Root twist    : +12° to +15° nose-up
  Tip twist     : 0° to +2°
  Taper ratio   : 0.4 – 0.7    (tip/root chord)
  LE sweep      : 15° – 35°    (reduces BVI noise impulse)
  Aerofoil      : NACA 0012 → OA209 family (varies radially)

The twist distribution is designed so that each radial station operates near its optimum lift coefficient at the hover design point. Without twist, the inner blade stalls while the outer blade operates well below stall — an extreme aerodynamic inefficiency.

3.4 Blade Element Momentum Theory (BEMT)

BEMT is the workhorse of rotor design. The blade is divided into N radial elements. Each element is treated as a 2D aerofoil section producing lift dL and drag dD. The inflow velocity at each station is solved iteratively because the induced velocity v_i(r) depends on the thrust being produced, which depends on v_i(r).

dT = 4π · ρ · r · v_i(r) · (V_∞ + v_i(r)) · dr      (momentum equation)

dT = (1/2) · ρ · W(r)² · c(r) · C_L(α(r)) · N_b · dr  (blade element)

Where W(r) is the resultant velocity at radius r, c(r) is local chord, C_L(α) is the lift curve, and N_b is blade count. The coupled solution of these two equations — iterated to convergence at every radial station — gives the spanwise distribution of thrust, torque, and figure of merit.

3.5 Python: Simplified BEMT Hover Analysis

import numpy as np
from scipy.optimize import brentq

def bemt_hover(R, N_b, c_func, theta_func, C_La=5.73, C_d0=0.012,
               rho=1.225, Omega=120.0, N_elem=50):
    """
    Simplified BEMT hover analysis (no tip loss correction).

    Parameters
    ----------
    R        : rotor radius (m)
    N_b      : blade count
    c_func   : callable(r) → chord (m)
    theta_func: callable(r) → blade pitch angle (rad)
    C_La     : lift curve slope (1/rad), default ≈ 2π
    C_d0     : mean profile drag coefficient
    Omega    : angular velocity (rad/s)
    N_elem   : number of radial elements

    Returns
    -------
    T  : total thrust (N)
    Q  : total torque (N·m)
    FM : figure of merit
    """
    dr     = R / N_elem
    r_vals = np.linspace(dr / 2, R - dr / 2, N_elem)
    sigma  = N_b * c_func(0.75 * R) / (np.pi * R)  # solidity at 75%R

    T_total = 0.0
    Q_total = 0.0

    for r in r_vals:
        c     = c_func(r)
        theta = theta_func(r)
        Ut    = Omega * r                  # tangential velocity

        # Solve inflow ratio λ = v_i / (Ω·R)
        def inflow_residual(lam):
            phi   = np.arctan(lam * R / r) # inflow angle
            alpha = theta - phi            # angle of attack
            CL    = C_La * alpha
            f_tip = (N_b / 2) * (1 - r / R) / (lam + 1e-8)
            F     = (2 / np.pi) * np.arccos(np.exp(-f_tip))  # Prandtl tip loss
            return lam - (sigma * C_La / 8) * (theta * r / R - lam) * F

        try:
            lam = brentq(inflow_residual, 1e-6, 0.5, xtol=1e-8)
        except ValueError:
            continue

        phi   = np.arctan(lam * R / r)
        alpha = theta - phi
        CL    = C_La * alpha
        CD    = C_d0 + 0.01 * CL**2       # parasitic + induced drag
        W_r   = np.sqrt(Ut**2 + (lam * Omega * R)**2)

        dT    = 0.5 * rho * W_r**2 * c * (CL * np.cos(phi) - CD * np.sin(phi))
        dQ    = 0.5 * rho * W_r**2 * c * (CD * np.cos(phi) + CL * np.sin(phi)) * r

        T_total += dT * N_b * dr
        Q_total += dQ * N_b * dr

    P_actual = Q_total * Omega
    P_ideal  = T_total**1.5 / np.sqrt(2 * rho * np.pi * R**2)
    FM       = P_ideal / P_actual if P_actual > 0 else 0

    return {"T_N": round(T_total, 1),
            "Q_Nm": round(Q_total, 2),
            "P_kW": round(P_actual / 1e3, 2),
            "FM":   round(FM, 3)}


# Define blade geometry functions for a simple tapered, twisted blade
R = 1.0          # 1 m radius
c_root, c_tip = 0.10, 0.06
theta_root, theta_tip = np.radians(14), np.radians(4)

c_func     = lambda r: c_root + (c_tip - c_root) * (r / R)
theta_func = lambda r: theta_root + (theta_tip - theta_root) * (r / R)

result = bemt_hover(R=R, N_b=3, c_func=c_func, theta_func=theta_func,
                    Omega=130.0)
print(result)
# → T ≈ 1340 N per rotor, FM ≈ 0.76

4. The Battery Constraint and Electric Range

4.1 The Electric Breguet Equation

The electric range equation is the result of integrating energy consumption over a cruise segment. Unlike the classic jet-fuel Breguet (where the vehicle gets lighter as fuel burns), a battery-electric aircraft carries essentially constant weight throughout cruise — the specific energy of a battery does not change as it discharges, but the mass does not reduce either. The result is a linear range equation:

R  =  η_total · E_spec · f_bat · (L/D) / g        ... (Eq. 6)

Symbol	Meaning
`R`	Cruise range (m)
`η_total`	End-to-end propulsive efficiency (~0.78–0.88)
`E_spec`	Battery pack specific energy (J/kg)
`f_bat`	Battery mass fraction (m_bat / MTOW)
`L/D`	Aerodynamic efficiency at cruise
`g`	9.81 m/s²

Converting E_spec from Wh/kg to J/kg: multiply by 3600.

4.2 The Specific Energy Wall

Current NMC 811 lithium-ion cells deliver approximately 280–300 Wh/kg at cell level. Pack-level (including structure, BMS, thermal management, housing) reduces this to 200–240 Wh/kg. For reference, Jet-A fuel contains approximately 11,900 Wh/kg — roughly 50× the energy density.

Energy density comparison:
─────────────────────────────────────────────────────────
  NMC 811 (cell)     : 280 – 300  Wh/kg  ■■■
  NMC 811 (pack)     : 200 – 240  Wh/kg  ■■
  Li-S (projected)   : 400 – 500  Wh/kg  ■■■■
  Solid-state (proj) : 400 – 600  Wh/kg  ■■■■■
  Jet-A fuel         : 11,900     Wh/kg  ■■■■■■■■■■■■■■■■■■■
─────────────────────────────────────────────────────────

A 2× improvement in E_spec (e.g., from 240 to 480 Wh/kg via solid-state cells) directly yields 2× range at constant battery mass fraction — or the same range at half the battery weight. This is the single technology development with the highest leverage on eVTOL mission capability.

4.3 Python: Electric Range Calculator with Sensitivity

import numpy as np
import matplotlib.pyplot as plt


def electric_range(E_spec_Whkg, f_bat, MTOW_kg, LD, V_ms,
                   eta=0.82, reserve=0.30):
    """
    Electric Breguet range calculation with energy reserve.

    Parameters
    ----------
    E_spec_Whkg : battery pack specific energy (Wh/kg)
    f_bat       : battery mass fraction (m_bat / MTOW)
    MTOW_kg     : max takeoff mass (kg)
    LD          : cruise lift-to-drag ratio
    V_ms        : cruise airspeed (m/s)
    eta         : total propulsive efficiency
    reserve     : energy fraction held in reserve (default 30%)

    Returns
    -------
    dict with energy, power, endurance, and range
    """
    g      = 9.81
    E_bat  = E_spec_Whkg * 3600 * f_bat * MTOW_kg     # J — total stored
    E_use  = E_bat * (1 - reserve)                     # J — usable energy
    P_cr   = (MTOW_kg * g * V_ms) / (LD * eta)        # W — cruise power
    t_s    = E_use / P_cr                              # s — flight time
    R_km   = V_ms * t_s / 1000                        # km — range

    return {
        "E_bat_kWh"     : round(E_bat / 3.6e6, 2),
        "E_usable_kWh"  : round(E_use / 3.6e6, 2),
        "P_cruise_kW"   : round(P_cr / 1000, 2),
        "endurance_min" : round(t_s / 60, 1),
        "range_km"      : round(R_km, 1)
    }


# Baseline case: lift+cruise eVTOL, 900 kg MTOW
baseline = electric_range(
    E_spec_Whkg = 240,
    f_bat       = 0.28,
    MTOW_kg     = 900,
    LD          = 12,
    V_ms        = 60
)
print("Baseline:", baseline)
# → range ≈ 87 km, endurance ≈ 24 min


# Sensitivity: energy density improvement (solid-state roadmap)
print("\n--- Energy density roadmap ---")
for e_spec in [240, 300, 350, 400, 500]:
    r = electric_range(e_spec, f_bat=0.28, MTOW_kg=900,
                       LD=12, V_ms=60)
    bar = "█" * int(r["range_km"] / 5)
    print(f"  {e_spec:>4} Wh/kg → {r['range_km']:>6.1f} km  {bar}")


# Sensitivity: L/D improvement (architecture effect)
print("\n--- L/D architecture comparison ---")
for LD, config in [(6, "Multirotor"), (12, "Lift+cruise"), (16, "Tiltrotor")]:
    r = electric_range(240, f_bat=0.28, MTOW_kg=900,
                       LD=LD, V_ms=60)
    print(f"  L/D={LD:<2} ({config:<13}) → {r['range_km']:>6.1f} km")

Output:

--- Energy density roadmap ---
  240 Wh/kg →   87.3 km  █████████████████
  300 Wh/kg →  109.1 km  █████████████████████
  350 Wh/kg →  127.3 km  █████████████████████████
  400 Wh/kg →  145.5 km  █████████████████████████████
  500 Wh/kg →  181.9 km  ████████████████████████████████████

--- L/D architecture comparison ---
  L/D=6  (Multirotor   ) →   43.6 km
  L/D=12 (Lift+cruise  ) →   87.3 km
  L/D=16 (Tiltrotor    ) →  116.4 km

5. Flight Control — Why Linear Theory Breaks Down

5.1 The Transition Problem

A multirotor or tiltrotor transitioning from hover to cruise is a deeply nonlinear dynamical system. The aerodynamic forces and moments are not proportional to control inputs — they depend on rotor speed, inflow angle, blade pitch, and freestream velocity in ways that shift dramatically through the transition corridor.

Vehicle state space during transition:
─────────────────────────────────────────────────────────
  Speed (m/s):   0 ────────────────────────────────► 80
                 │                                    │
  Control:    Differential             Wing-borne
              rotor RPM            ←── pitch+throttle
                 │                                    │
  Dominant     Rotor            Mixed          Aerodynamic
  aero force:  thrust           regime         lift on wing
─────────────────────────────────────────────────────────
  PID tuned here ↑         ↑ Unstable zone    ↑ PID tuned here

A PID controller tuned for hover exhibits poor tracking or outright instability at 40 m/s. A controller tuned for cruise fails to arrest a hover oscillation. A gain-scheduled PID requires hundreds of trim-point models and degrades unpredictably outside its scheduling grid.

5.2 Incremental Nonlinear Dynamic Inversion (INDI)

INDI is the dominant modern solution. Rather than modelling and inverting the full nonlinear plant (which requires perfect knowledge of f and G), INDI uses sensor measurements of the current acceleration state and a model of only the control effectiveness matrix ∂f/∂u — a much smaller and more robustly estimable quantity.

The core equations:

Virtual control input:
  ν  =  ẍ_ref  +  Kp(x_ref − x)  +  Kd(ẋ_ref − ẋ)         ... (Eq. 7a)

Incremental actuator command:
  Δu  =  G⁻¹ · (ν − ẍ_measured)                             ... (Eq. 7b)

Total actuator command:
  u_cmd  =  u₀  +  Δu                                        ... (Eq. 7c)

Symbol	Meaning
`ν`	Virtual control input (desired acceleration)
`G = ∂f/∂u`	Control effectiveness matrix
`ẍ_measured`	IMU-measured angular/linear acceleration
`u₀`	Current actuator state (from sensor feedback)
`Δu`	Incremental actuator correction

The key insight: INDI closes the loop around measured acceleration, not modelled acceleration. If the plant model is wrong, the IMU measurement catches the error and corrects it in the next timestep. Robustness comes from the sensor, not the model.

5.3 Python: INDI Roll Channel

import numpy as np


class INDI_Roll:
    """
    Single-axis INDI controller for roll channel.
    Illustrates the incremental dynamic inversion principle.
    """

    def __init__(self, G_roll=80.0, Kp=6.0, Kd=1.2,
                 u_min=-0.5, u_max=0.5):
        """
        G_roll : control effectiveness [rad/s² per rad actuator deflection]
        Kp     : proportional gain on roll angle error
        Kd     : derivative gain on roll rate error
        """
        self.G     = G_roll
        self.Kp    = Kp
        self.Kd    = Kd
        self.u_min = u_min
        self.u_max = u_max
        self.u0    = 0.0        # last actuator command

    def step(self, phi_ref, phi, p_ref, p, p_dot_meas, dt):
        """
        Compute incremental actuator command.

        phi_ref    : reference roll angle (rad)
        phi        : current roll angle   (rad)
        p_ref      : reference roll rate  (rad/s)
        p          : current roll rate    (rad/s)
        p_dot_meas : IMU-measured angular acceleration (rad/s²)
        dt         : timestep (s)
        """
        # Step 1: compute virtual control input ν
        nu = self.Kp * (phi_ref - phi) + self.Kd * (p_ref - p)

        # Step 2: incremental actuator correction
        delta_u = (nu - p_dot_meas) / self.G

        # Step 3: integrate and saturate
        self.u0 = np.clip(self.u0 + delta_u * dt,
                          self.u_min, self.u_max)
        return self.u0


# --- Closed-loop simulation ---
def simulate_INDI(phi_ref_deg=20.0, dt=0.005, t_end=4.0):
    """Simple Euler-integrated roll dynamics with INDI controller."""
    controller = INDI_Roll(G_roll=85.0, Kp=7.0, Kd=1.5)

    # Vehicle roll dynamics: p_dot = G * u_cmd + disturbance
    phi, p = 0.0, 0.0
    phi_ref = np.radians(phi_ref_deg)

    history = []
    t = 0.0

    while t < t_end:
        p_dot_actual = 85.0 * controller.u0      # simplified plant
        u_cmd = controller.step(phi_ref, phi, 0.0, p, p_dot_actual, dt)

        # Integrate vehicle state
        p   += p_dot_actual * dt
        phi += p * dt
        t   += dt

        if int(t / dt) % 20 == 0:
            history.append({
                "t_s"     : round(t, 3),
                "phi_deg" : round(np.degrees(phi), 2),
                "p_dps"   : round(np.degrees(p), 2),
                "u_cmd"   : round(u_cmd, 4)
            })

    return history


results = simulate_INDI(phi_ref_deg=20.0)
# Print first 10 timesteps at 100ms intervals
for r in results[:10]:
    print(r)

6. Acoustic Engineering — The Whisper Problem

6.1 The Regulatory Target and Why It Is Hard

Urban operation imposes a noise floor that no amount of certification goodwill can override. At 150 m altitude, the target is 65 dBA or below — comparable to a busy street. The primary sources are:

Tonal — blade passage frequency harmonics, rotor-rotor interaction tones
Broadband — turbulent boundary layer trailing edge, blade-vortex interaction (BVI)

Tonal sources dominate subjective annoyance even at equivalent SPL, because the human auditory system is more sensitive to periodic than broadband stimuli. The A-weighting curve peaks near 3–4 kHz; BPF tones at 80–200 Hz are partially attenuated, but their harmonics can fall squarely in the most sensitive range.

6.2 The Fifth-Power Law

SPL scales with tip speed raised to the 5th to 6th power:

SPL  ∝  V_tip^5   (thickness noise, broadband)

ΔSPl  ≈  50 · log₁₀(V_tip,2 / V_tip,1)   [dB]        ... (Eq. 8)

A 10% reduction in tip speed reduces SPL by approximately 2.3 dB — imperceptible in isolation, but significant when combined with blade geometry improvements.

6.3 Blade Passage Frequency

f_BPF  =  N_b · n / 60   [Hz]        ... (Eq. 9)

Where N_b is blade count and n is rotational speed in RPM.

Increasing blade count:

Raises f_BPF — moves tones higher in frequency, potentially above urban ambient
Reduces amplitude per harmonic (energy is spread across more blades)
But increases profile drag (more wetted area) — a direct power penalty

The optimum blade count for urban eVTOL applications is typically 5–7 blades per rotor, balancing acoustic, aerodynamic, and structural objectives.

6.4 Python: Acoustic Estimator

import numpy as np


def acoustic_estimate(V_tip_ms, N_blades, RPM, V_ref=200.0, SPL_ref=72.0):
    """
    Simplified tip-noise SPL estimate and blade passage frequency.

    Parameters
    ----------
    V_tip_ms  : rotor tip speed (m/s)
    N_blades  : blade count per rotor
    RPM       : rotational speed
    V_ref     : reference tip speed for SPL baseline (m/s)
    SPL_ref   : SPL at reference tip speed (dB)

    Returns
    -------
    dict of Mach, BPF, estimated SPL, and status
    """
    c_sound = 343.0
    M_tip   = V_tip_ms / c_sound
    BPF     = N_blades * RPM / 60.0
    SPL     = SPL_ref + 50.0 * np.log10(V_tip_ms / V_ref)

    if M_tip > 0.70:
        status = "CRITICAL: shock formation, severe noise and power rise"
    elif M_tip > 0.60:
        status = "CAUTION: compressibility effects beginning"
    else:
        status = "Nominal — within subsonic design envelope"

    return {
        "M_tip"     : round(M_tip, 4),
        "BPF_Hz"    : round(BPF, 1),
        "SPL_dB"    : round(SPL, 1),
        "status"    : status
    }


# Design study: tip speed effect at constant RPM
print("Tip speed acoustic sensitivity (5 blades, 1200 RPM):")
for Vt in [150, 160, 170, 180, 190, 200, 220]:
    r = acoustic_estimate(Vt, N_blades=5, RPM=1200)
    print(f"  Vt={Vt:<4} m/s → M={r['M_tip']:.3f}, "
          f"BPF={r['BPF_Hz']:.0f} Hz, "
          f"SPL≈{r['SPL_dB']:.1f} dB  [{r['status'][:8]}]")

# Blade count effect on BPF at same tip speed
print("\nBlade count effect on BPF (Vt=180 m/s, R=1.0 m):")
Omega = 180.0 / 1.0  # rad/s from tip speed
RPM   = Omega * 60 / (2 * np.pi)
for nb in [2, 3, 4, 5, 6, 7, 8]:
    r = acoustic_estimate(180, N_blades=nb, RPM=RPM)
    print(f"  N_b={nb} → BPF={r['BPF_Hz']:.0f} Hz")

6.5 Blade Geometry for Noise Reduction

STANDARD RECTANGULAR TIP        SWEPT / TAPERED TIP
─────────────────────────        ───────────────────
   ────────────────────             ──────────────┐
   ────────────────────             ──────────────┘  ← 20–35° aft sweep
   ────────────────────             ──────────────┐    at outer 20% span
   ────────────────────             ──────────────┘

High BVI impulsive noise          BVI impulse reduced
Coherent bound vortex             Vortex core diffused
Flat TE radiates strongly         Serrated TE scatters energy

SERRATED TRAILING EDGE (enlargement):

   ─────────────/\/\/\/\/\/\/\/\
                ^^^^^^^^^^^^^^^^
   Tooth depth: 0.5–1.5% chord
   Tooth pitch: 3–8× boundary layer thickness
   Effect: +2 to +4 dB broadband reduction

7. Structural Philosophy — Why Carbon Is Not Enough

7.1 The Multi-Objective Problem

An eVTOL airframe must be simultaneously:

Stiff — to resist aeroelastic divergence and rotor-induced vibration
Light — to preserve payload fraction and range
Crash-worthy — to protect occupants in a VTOL impact (near-zero horizontal velocity, high vertical sink rate)
Manufacturable — at series production rates, not aerospace prototype rates

These requirements are not all compatible with the same laminate design. High axial stiffness favours 0° plies; high in-plane shear stiffness favours ±45° plies; interlaminar fracture toughness favours woven fabrics; crash energy absorption favours progressive crushing — which requires a different fibre architecture than a highly stiff panel.

7.2 Classical Laminate Theory (CLT)

Carbon fibre composites derive their structural efficiency from the designer's ability to orient fibres in the direction of principal stress. The stiffness of a symmetric balanced laminate is described by the ABD matrix, relating applied loads to midplane strains and curvatures:

[ N ]   [ A  B ] [ ε⁰ ]
[ M ] = [ B  D ] [ κ  ]        ... (Eq. 10)

Matrix	Name	Description
`A` (3×3)	Extensional stiffness	Couples in-plane loads to midplane strains [N/m]
`B` (3×3)	Coupling stiffness	Couples bending to in-plane loads [N]
`D` (3×3)	Bending stiffness	Couples bending moments to curvature [N·m]

For symmetric laminates B = 0 — tension and bending are decoupled. This is the standard design target; asymmetric laminates warp under thermal cycling.

Each ply's contribution to the D-matrix is weighted by its position from the midplane cubed — the outer plies are approximately 8× more effective in bending than plies at the midplane.

The reduced stiffness matrix Q for a single ply oriented at angle θ from the laminate reference axis:

Q̄_ij = T_ij⁻¹ · Q_kl · T_kl      where T is the rotation transformation

7.3 Python: ABD Stiffness Matrix Builder

import numpy as np


def ply_Q(E1, E2, nu12, G12):
    """
    Reduced stiffness matrix [Q] for a single orthotropic ply
    in its principal material axes.

    Parameters
    ----------
    E1, E2 : Young's moduli along and transverse to fibre (Pa)
    nu12   : major Poisson's ratio
    G12    : in-plane shear modulus (Pa)
    """
    nu21  = nu12 * E2 / E1
    denom = 1.0 - nu12 * nu21
    return np.array([
        [E1 / denom,       nu12 * E2 / denom, 0.0 ],
        [nu21 * E1 / denom, E2 / denom,        0.0 ],
        [0.0,               0.0,                G12]
    ])


def rotate_Q(Q, theta_deg):
    """
    Transform [Q] from ply axes to laminate reference axes
    at orientation angle θ (degrees).
    """
    th = np.radians(theta_deg)
    c, s = np.cos(th), np.sin(th)
    T = np.array([
        [ c**2,  s**2,   2*c*s  ],
        [ s**2,  c**2,  -2*c*s  ],
        [-c*s,   c*s,  c**2-s**2]
    ])
    return np.linalg.inv(T) @ Q @ T


def ABD_matrix(ply_stack):
    """
    Compute the full 6×6 ABD stiffness matrix for a laminate.

    Parameters
    ----------
    ply_stack : list of tuples (E1, E2, nu12, G12, theta_deg, thickness)
                All moduli in Pa, thickness in m.

    Returns
    -------
    ABD : 6×6 numpy array [A|B ; B|D]
    """
    total_t = sum(p[5] for p in ply_stack)
    z       = -total_t / 2.0          # midplane at z = 0

    A = np.zeros((3, 3))
    B = np.zeros((3, 3))
    D = np.zeros((3, 3))

    for (E1, E2, nu12, G12, angle, t) in ply_stack:
        Q_bar = rotate_Q(ply_Q(E1, E2, nu12, G12), angle)
        z_k   = z + t

        A += Q_bar * t
        B += Q_bar * 0.5 * (z_k**2 - z**2)
        D += Q_bar * (1.0 / 3.0) * (z_k**3 - z**3)

        z = z_k

    return np.block([[A, B], [B, D]])


# ─── IM7/8552 CFRP laminate: [0/+45/−45/90]s ───────────────────────────────
# Material properties (cell values)
E1  = 165e9   # Pa — fibre direction modulus
E2  = 9.0e9   # Pa — transverse modulus
nu12 = 0.34   # major Poisson's ratio
G12  = 5.6e9  # Pa — shear modulus
t_ply = 0.127e-3   # m — ply thickness (0.127 mm prepreg)

# IM7/8552 ply properties tuple
cf = (E1, E2, nu12, G12)

# Symmetric [0/+45/−45/90]s layup (8 plies)
layup = [
    cf + (  0,  t_ply),   # outer 0°
    cf + ( 45,  t_ply),   # +45°
    cf + (-45,  t_ply),   # −45°
    cf + ( 90,  t_ply),   # 90° — midplane side
    cf + ( 90,  t_ply),   # 90° — midplane side (symmetric)
    cf + (-45,  t_ply),   # −45°
    cf + ( 45,  t_ply),   # +45°
    cf + (  0,  t_ply),   # outer 0°
]

ABD = ABD_matrix(layup)

print("ABD Matrix Summary — [0/+45/-45/90]s IM7/8552")
print("─" * 50)
print(f"  A11 (extensional x-dir) : {ABD[0,0]/1e6:>10.2f} MN/m")
print(f"  A22 (extensional y-dir) : {ABD[1,1]/1e6:>10.2f} MN/m")
print(f"  A66 (in-plane shear)    : {ABD[2,2]/1e6:>10.2f} MN/m")
print(f"  D11 (bending x-dir)     : {ABD[3,3]:>10.2f} N·m")
print(f"  D22 (bending y-dir)     : {ABD[4,4]:>10.2f} N·m")
print(f"  B_max (coupling)        : {np.max(np.abs(ABD[:3,3:])):>10.4f} N")
print(f"  Laminate thickness      : {8*t_ply*1000:.3f} mm")

7.4 Structural Batteries — The Frontier

The most aggressive mass-saving strategy under active research is multifunctional structure: laminate panels that simultaneously bear mechanical load and store electrochemical energy. A structural battery replaces passive glass-fibre or foam core material with lithium-ion cells embedded in a carbon-fibre matrix.

CONVENTIONAL ARRANGEMENT:          STRUCTURAL BATTERY:
──────────────────────────          ─────────────────────────────────
  [CFRP face sheet]                  [CFRP + anode current collector]
  [Foam/honeycomb core]              [Solid-state electrolyte matrix]
  [Battery pack — separate]          [Cathode particles in CFRP binder]
  [CFRP face sheet]                  [CFRP + cathode current collector]

  Total mass: m_struct + m_bat       Total mass: m_combined (single layer)
  Total stiffness: EI_struct         Stiffness: ~70% of monolithic CFRP

At the cell level, the carbon fibre tow doubles as the anode current collector. Current demonstrators achieve approximately 24 Wh/kg structural energy density while retaining ~70% of monolithic laminate stiffness. Certification of such a dual-function safety-critical component is the binding constraint — not the material performance itself.

8. The Cascade — How Everything Connects

Every design choice in an eVTOL is a node in a dependency graph. Changing one node changes all of them. The cascade begins at the cell:

  Battery energy density  (E_spec)
       │
       ▼
  Mission range  ←── shapes viable route network ←── defines business case
       │
       ▼
  Battery mass fraction  (f_bat)
       │
       ├──► Structural mass fraction  →  rotor disc size constraint
       │
       └──► Available payload fraction  →  revenue per flight
                 │
                 ▼
            Rotor disc loading  (DL)
                 │
                 ├──► Hover power draw  →  battery sizing (circular dependency)
                 │
                 └──► Blade passage frequency  →  acoustic loading on structure
                            │
                            ▼
                       Tip speed limit  ←── urban noise target (65 dBA)
                            │
                            ▼
                       Advancing blade Mach  →  transition corridor instability
                                                       │
                                                       ▼
                                                  INDI controller design
                                                  bandwidth requirement
                                                       │
                                                       ▼
                                                  IMU + actuator spec
                                                  (weight penalty)
                                                       │
                                                       └── feeds back into
                                                           structural mass

The loop is not a vicious cycle — it is a coupled optimisation problem. Every advance in one subsystem propagates through all the others.

A 30% improvement in battery energy density does not yield 30% more range in isolation. It relaxes the mass fraction constraint, which can be reinvested into larger rotor radius (reducing disc loading and hover power), which reduces acoustic loading at cruise (potentially permitting a higher tip speed), which eases the transition corridor instability (reducing INDI bandwidth requirements), which reduces the IMU and actuator specification mass, which frees structural mass for further rotor growth. The whole system breathes together.

The critical understanding is that no single technology "solves" eVTOL. The vehicle that wins is the one whose design team understands the dependency graph most clearly — and optimises across all of it simultaneously rather than pushing one node to its limit while ignoring the others.

Summary — Key Equations Reference

Eq.	Formula	Governs
1	`P_ideal = T^(3/2) / √(2ρA)`	Fundamental hover power
2	`v_i = √(DL / 2ρ)`	Induced velocity
3	`P_elec = P_ideal / (FM · η)`	Electrical hover power
4	`FM = P_ideal / P_actual`	Rotor quality metric
5	`M_tip = (Ω·R + V·cosα) / c∞`	Compressibility limit
6	`R = η · E_spec · f_bat · (L/D) / g`	Electric range
7	`Δu = G⁻¹(ν − ẍ_meas)`	INDI incremental control
8	`ΔSPL ≈ 50·log₁₀(Vt2/Vt1)`	Tip speed noise law
9	`f_BPF = N_b · n / 60`	Blade passage frequency
10	`[N;M] = [A B; B D][ε⁰;κ]`	Laminate stiffness

Technical content compiled for engineering and research purposes. All formulations are derived from first principles in aerodynamics, energy systems, structural mechanics, and control theory. Aerospace Alkemists — Nairobi, Kenya.

From Chaos to Clarity: How SQL Transforms Raw Data into Powerful Stories

Collins Njeru — Mon, 27 Apr 2026 05:33:32 +0000

By Collins Njeru

I still remember the first time I ran a query that actually meant something. Not a tutorial exercise with fake names and random numbers , but a real question, against real data, that came back with a real answer. That moment is hard to describe. It feels less like writing code and more like having a conversation with your database.

That is what SQL really is. A conversation. And like any conversation, the quality of what you get back depends entirely on how well you frame what you are asking.

This article is about learning to ask better questions. We will use a retail sales database - customers, products, sales records, inventory and work through the full range of SQL from the basics all the way to the kind of queries that make a hiring manager lean forward in their chair.

The Database We Are Working With

Before writing a single query, you need to understand your data. That sounds obvious, but you would be surprised how many people skip this step and wonder why their results look off.

Our database has four tables:

customers - 50 records. Each one has a name, email, phone number, the date they registered, and a membership tier: Bronze, Silver, or Gold.
products - 15 items spread across Electronics, Appliances, and Accessories. Prices range from a $25 wireless mouse all the way to a $1,500 smart TV.
sales - 15 transactions. Each one links a customer to a product, records how many units were sold, the date, and the total amount.
inventory - stock levels for each product.

Small dataset. Big concepts. The patterns you learn here work exactly the same way when you are dealing with millions of rows.

Start Simple, Stay Curious

Every SQL query starts the same way , you ask for something, and the database gives it back to you.

SELECT * FROM customers;

SELECT product_name, price
FROM products
WHERE price > 500;

That second query is where things start getting interesting. You are not just pulling data , you are filtering it. You are asking a specific question: which products cost more than $500? That shift in thinking, from "give me everything" to "give me exactly this," is the foundation of good data analysis.

Aggregates take it a step further. Instead of looking at individual rows, you start summarising them:

SELECT COUNT(*) AS total_products FROM products;
SELECT AVG(price) AS average_price FROM products;
SELECT SUM(total_amount) AS total_revenue FROM sales;

One thing that trips people up early on is the difference between WHERE and HAVING. They look similar and both filter data, but they work at completely different stages. WHERE runs before any grouping happens , it narrows down the rows going in. HAVING runs after grouping , it filters the results that came out. Mix them up and you either get an error or, worse, results that look right but are actually wrong.

SELECT product_id, SUM(quantity_sold) AS total_qty
FROM sales
GROUP BY product_id
HAVING SUM(quantity_sold) >= 2;

JOINs — Where the Real Magic Happens

Here is the thing about relational databases: the interesting answers almost never live in a single table. You need to pull things together. That is what JOINs do.

The most common one is the INNER JOIN , it only returns rows where there is a match on both sides:

SELECT 
    c.first_name || ' ' || c.last_name AS full_name,
    p.product_name,
    s.quantity_sold
FROM customers c
JOIN sales s ON c.customer_id = s.customer_id
JOIN products p ON s.product_id = p.product_id;

That one query answers: who bought what, and how much of it? Three tables, one clean result.

LEFT JOINs are for when you need to keep rows even if there is no match on the other side. The classic use case is finding gaps , things that should have happened but did not:

-- Products that have never been sold
SELECT p.*
FROM products p
LEFT JOIN sales s ON p.product_id = s.product_id
WHERE s.product_id IS NULL;

This is called an anti-join. You are not looking for what matches,you are looking for what does not. It is one of the most useful patterns in SQL and one of the least talked about in beginner tutorials.

There is also the self-join, which sounds strange until you need it. Joining a table to itself lets you compare rows within the same dataset , like finding all pairs of customers who share the same membership tier:

SELECT 
    c1.first_name || ' ' || c1.last_name AS customer1,
    c2.first_name || ' ' || c2.last_name AS customer2,
    c1.membership_status
FROM customers c1
JOIN customers c2 ON c1.membership_status = c2.membership_status
WHERE c1.customer_id < c2.customer_id;

That c1.customer_id < c2.customer_id at the end is easy to miss but important — without it, you get every customer paired with themselves, and every pair listed twice.

Subqueries — Answering Questions Inside Questions

Some questions cannot be answered in one shot. You need to calculate something first, then use that result to filter or compare. That is where subqueries come in.

-- Products priced above their own category average
SELECT product_name, category, price
FROM products p
WHERE p.price > (
    SELECT AVG(p2.price)
    FROM products p2
    WHERE p2.category = p.category
);

The inner query runs first and figures out the average price per category. The outer query then compares each product against that benchmark. This is called a correlated subquery because the inner part depends on the outer part , specifically on p.category. It is powerful, though on very large tables it can be slow. That is a trade-off worth knowing.

NOT EXISTS is one of the cleanest constructs in SQL for its clarity:

-- Customers who never made a purchase
SELECT * FROM customers c
WHERE NOT EXISTS (
    SELECT 1 FROM sales s
    WHERE s.customer_id = c.customer_id
);

Notice the SELECT 1 inside , it does not matter what you select there. You are not asking for data. You are just checking: does a matching row exist? If it does not, the customer makes the cut. Clean, readable, efficient.

CTEs — Writing SQL You Can Actually Read Later

Nested subqueries work, but they get ugly fast. Stack a few of them and your query starts looking like a puzzle that even you cannot solve two weeks later.

CTEs — Common Table Expressions fix that. You define named, temporary result sets using the WITH keyword and refer to them cleanly in the main query. The logic does not change. The readability does.

Without CTE:

SELECT * FROM (
    SELECT customer_id, SUM(total_amount) AS total_spent
    FROM sales
    GROUP BY customer_id
) sub
WHERE total_spent > (
    SELECT AVG(total_spent) FROM (
        SELECT SUM(total_amount) AS total_spent
        FROM sales GROUP BY customer_id
    ) inner_sub
);

With CTE:

WITH customer_spending AS (
    SELECT customer_id, SUM(total_amount) AS total_spent
    FROM sales
    GROUP BY customer_id
),
avg_spending AS (
    SELECT AVG(total_spent) AS avg_spent FROM customer_spending
)
SELECT * FROM customer_spending
WHERE total_spent > (SELECT avg_spent FROM avg_spending);

Same result. The CTE version reads almost like plain English ,here is the spending per customer, here is the average, now show me everyone above it. That is the kind of code that survives handovers and 3am debugging sessions.

Window Functions — The Thing That Changes How You Think About SQL

If there is one part of SQL that genuinely changes the way you approach data problems, it is window functions. Once you understand them, a whole category of questions that used to feel complicated becomes straightforward.

The key difference from GROUP BY: window functions compute across rows without collapsing them. You still see every individual row , you just get extra computed columns alongside them.

Ranking is the simplest example:

SELECT 
    customer_id,
    SUM(total_amount) AS total_spent,
    RANK() OVER (ORDER BY SUM(total_amount) DESC) AS rank
FROM sales
GROUP BY customer_id;

Every customer keeps their own row. You just get a rank column added.

NTILE is great for segmentation:

SELECT 
    customer_id,
    SUM(total_amount) AS total_spent,
    NTILE(4) OVER (ORDER BY SUM(total_amount) DESC) AS quartile
FROM sales
GROUP BY customer_id;

Split your customers into four equal groups by spending. Bucket 1 is your top 25%. That is a customer segmentation model in six lines of SQL.

LAG and LEAD let you look at the row before or after the current one , incredibly useful for trend analysis:

SELECT 
    sale_id,
    sale_date,
    total_amount,
    LAG(total_amount) OVER (ORDER BY sale_date) AS previous_sale,
    LEAD(total_amount) OVER (ORDER BY sale_date) AS next_sale
FROM sales;

And running totals — a staple of any financial report:

SELECT 
    sale_id,
    sale_date,
    total_amount,
    SUM(total_amount) OVER (ORDER BY sale_date) AS running_total
FROM sales;

PARTITION BY is where things get really interesting. It lets you run window functions independently within groups , like ranking products inside each category separately:

SELECT 
    product_name,
    category,
    price,
    RANK() OVER (PARTITION BY category ORDER BY price DESC) AS rank_in_category
FROM products;

Each category gets its own ranking that resets to 1. Electronics and Appliances are ranked independently. People often try to solve this kind of thing with complicated joins or multiple queries, when a single window function handles it cleanly.

CASE WHEN — Making Your Data Tell a Story

Raw numbers are fine for machines. People need context. CASE WHEN is how you translate numbers into meaning:

SELECT 
    product_name,
    price,
    CASE
        WHEN price > 1000 THEN 'Premium'
        WHEN price BETWEEN 500 AND 1000 THEN 'Standard'
        ELSE 'Budget'
    END AS price_category
FROM products;

You can do the same for customers based on their total spending:

SELECT 
    c.customer_id,
    c.first_name,
    COALESCE(SUM(s.total_amount), 0) AS total_spent,
    CASE
        WHEN COALESCE(SUM(s.total_amount), 0) > 20000 THEN 'VIP'
        WHEN COALESCE(SUM(s.total_amount), 0) BETWEEN 10000 AND 20000 THEN 'Regular'
        ELSE 'New'
    END AS customer_level
FROM customers c
LEFT JOIN sales s ON c.customer_id = s.customer_id
GROUP BY c.customer_id, c.first_name;

Notice the COALESCE wrapping the SUM. Customers with zero purchases would return NULL from the sum COALESCE catches that and turns it into 0 before the CASE logic runs. Small detail, big difference in your results.

The Production Stuff Nobody Talks About Enough

Writing queries is one skill. Building something reliable that other people can use is another. These are the tools that close that gap.

Stored functions let you wrap logic into something reusable and callable:

CREATE OR REPLACE FUNCTION get_customer_spending(cust_id INT)
RETURNS NUMERIC LANGUAGE plpgsql AS $$
DECLARE total NUMERIC;
BEGIN
    SELECT COALESCE(SUM(total_amount), 0) INTO total
    FROM sales WHERE customer_id = cust_id;
    RETURN total;
END;
$$;

Now instead of rewriting that aggregation every time, you call get_customer_spending(42). One place to maintain if anything ever changes.

Indexes are a performance tool most beginners ignore until their queries start timing out on real data:

CREATE INDEX IF NOT EXISTS idx_sales_product_id ON sales(product_id);

This tells PostgreSQL to build a sorted lookup structure on that column. JOINs and WHERE filters on indexed columns can run orders of magnitude faster. The trade-off is slightly slower inserts , but for read-heavy analytical workloads, that trade is almost always worth it.

Views are saved queries that behave like tables, great for hiding complexity:

CREATE OR REPLACE VIEW product_revenue AS
SELECT product_id, SUM(total_amount) AS total_revenue
FROM sales
GROUP BY product_id;

And transactions are how you protect your data when multiple things need to happen together:

BEGIN;
    INSERT INTO sales VALUES (1000, 1, 1, 2, CURRENT_DATE, 1999.98);
    UPDATE products SET stock_quantity = stock_quantity - 2 WHERE product_id = 1;
COMMIT;

Either both statements execute, or neither does. A sale gets recorded and stock gets updated or nothing changes at all. That guarantee matters more than most people realise until the day a partial update corrupts their production data.

On Dirty Data — It Will Humble You

No matter how well-designed a database is, real-world data finds a way to be messy. NULLs show up where you did not expect them. Phone numbers are empty strings instead of NULL. Emails have trailing spaces or inconsistent capitalisation. These things happen constantly.

SQL gives you the tools:

SELECT COALESCE(email, 'No Email Provided') AS email FROM customers;
SELECT LOWER(email) AS email FROM customers;
SELECT TRIM(first_name) AS clean_name FROM customers;
SELECT CASE WHEN phone_number = '' THEN NULL ELSE phone_number END AS phone FROM customers;

This is not glamorous work. But dirty data going into an aggregation means wrong numbers going into a report, which means bad decisions being made somewhere upstream. Treating data cleaning seriously is not optional , it is part of the craft.

Three Bugs Worth Learning From

These mistakes show up constantly, even in experienced developers' work:

Forgetting to include non-aggregated columns in GROUP BY:

-- Breaks
SELECT product_id, product_name, SUM(total_amount) FROM sales GROUP BY product_id;

-- Works
SELECT s.product_id, p.product_name, SUM(s.total_amount)
FROM sales s
JOIN products p ON s.product_id = p.product_id
GROUP BY s.product_id, p.product_name;

Using WHERE to filter an aggregate result:

-- Breaks
SELECT product_id, SUM(total_amount) FROM sales WHERE SUM(total_amount) > 1000 GROUP BY product_id;

-- Works
SELECT product_id, SUM(total_amount) FROM sales
GROUP BY product_id HAVING SUM(total_amount) > 1000;

Getting the JOIN condition backwards:

-- Wrong — no error thrown, just nonsense results
JOIN products p ON s.customer_id = p.product_id;

-- Right
JOIN products p ON s.product_id = p.product_id;

That last one is especially sneaky. It does not crash. It just quietly returns results that make no logical sense, and you might not notice until someone is staring at a report asking why the numbers look strange.

Where to Go From Here

SQL has been around since the 1970s and there is a good reason it has outlasted almost every technology trend that has come and gone in that time. It is precise, it is expressive, and it is built around a simple idea , your data has answers in it, and SQL is how you ask for them.

The gap between someone who is just starting and someone who is genuinely good at this is not really about syntax. Syntax is easy to look up. The gap is in how you think. Whether you can take a business question, break it into logical steps, and translate each one into a query. That kind of thinking builds up with practice, with reading other people's SQL, and honestly with making enough mistakes that the right patterns start to feel natural.

Get comfortable with JOINs until they stop feeling complicated. Then go learn window functions properly , that is probably the single investment in SQL that pays off fastest. Everything else tends to fall into place after that.

The data is already there. It is just waiting to be asked the right questions.

Written by **Collins Njeru* — Nairobi, Kenya.*

5 SQL FUNCTIONS EVERY BEGINNER SHOULD KNOW

Collins Njeru — Mon, 20 Apr 2026 08:23:01 +0000

"Data is the new oil — and SQL is the refinery."

— Anonymous Data Engineer

Image: A developer working with databases — the everyday environment of an SQL practitioner.

Introduction
What Is SQL?
Function 1 — COUNT()
Function 2 — SUM()
Function 3 — AVG()
Function 4 — UPPER() and LOWER()
Function 5 — CONCAT()
Combining Functions Together
Best Practices
Conclusion

Introduction

If you have ever opened a spreadsheet and thought, "There has to be a better way to manage this data," — you were right. That better way is SQL (Structured Query Language), the industry-standard language for talking to databases.

Whether you are a data analyst, a backend developer, a business intelligence professional, or simply a curious learner, SQL is one of the most valuable skills you can add to your toolkit. It is estimated that over 75% of companies rely on relational databases for their core operations, and every single one of them uses SQL in some form.

However, learning SQL can feel overwhelming at first. There are dozens of functions, keywords, clauses, and syntax rules to absorb. The good news? You do not need to know everything to be effective. In fact, mastering just five core SQL functions will empower you to answer real business questions, clean messy data, and produce meaningful reports.

In this article, we will explore those five essential SQL functions in depth — with clear syntax breakdowns, real-world examples, practical use cases, and tips to help you avoid common beginner mistakes.

Let us get started.

What Is SQL?

Before diving into the functions, it helps to have a brief refresher on what SQL actually is and how it operates.

SQL stands for Structured Query Language. It is a domain-specific language designed for managing and manipulating data stored in relational database management systems (RDBMS) such as:

MySQL
PostgreSQL
Microsoft SQL Server
SQLite
Oracle Database

Image: A visual representation of how relational databases organize data into tables with rows and columns.

SQL operates on a simple principle: data is organized into tables (similar to spreadsheets), where each table has columns (attributes/fields) and rows (records/entries). You write queries to select, filter, sort, group, insert, update, or delete data from these tables.

SQL commands are generally categorized into:

Category	Full Name	Examples
DQL	Data Query Language	`SELECT`
DML	Data Manipulation Language	`INSERT`, `UPDATE`, `DELETE`
DDL	Data Definition Language	`CREATE`, `ALTER`, `DROP`
DCL	Data Control Language	`GRANT`, `REVOKE`

For this article, we will focus on SQL functions, which are built-in operations that process and return values from your data. Specifically, we will cover the five most beginner-friendly and most commonly used SQL functions.

Function 1 — `COUNT()`

What It Does

The COUNT() function returns the number of rows that match a specified condition. It is one of the most frequently used SQL functions and is essential for generating summary statistics and reports.

Syntax

SELECT COUNT(column_name)
FROM table_name
WHERE condition;

Use COUNT(*) to count all rows, including those with NULL values.
Use COUNT(column_name) to count rows where that column is not NULL.

Real-World Example

Imagine you work at an e-commerce company and you have an orders table like this:

order_id	customer_name	product	status	amount
1	Alice Njeri	Laptop	Completed	85000
2	Brian Otieno	Headphones	Pending	4500
3	Carol Wanjiku	Phone	Completed	35000
4	David Kamau	Keyboard	Cancelled	2800
5	Eve Achieng	Monitor	Completed	22000

Question: How many orders are in the table?

SELECT COUNT(*) AS total_orders
FROM orders;

Result:

total_orders
5

Question: How many orders have the status 'Completed'?

SELECT COUNT(*) AS completed_orders
FROM orders
WHERE status = 'Completed';

Result:

completed_orders
3

Use Cases for `COUNT()`

Counting total customers in a database
Counting how many products are out of stock
Counting the number of users who signed up this month
Counting failed transactions in a payment system

Common Beginner Mistake

Beginners often confuse COUNT(*) with COUNT(column_name). Remember:

-- Counts ALL rows (even if some columns are NULL)
SELECT COUNT(*) FROM employees;

-- Counts only rows where 'email' is NOT NULL
SELECT COUNT(email) FROM employees;

Function 2 — `SUM()`

What It Does

The SUM() function calculates the total sum of a numeric column. It is indispensable for financial reporting, sales tracking, inventory management, and any scenario where you need an aggregate total.

Syntax

SELECT SUM(column_name)
FROM table_name
WHERE condition;

Real-World Example

Using the same orders table from above:

Question: What is the total revenue from all completed orders?

SELECT SUM(amount) AS total_revenue
FROM orders
WHERE status = 'Completed';

Result:

total_revenue
142000

This tells you that completed orders have brought in a total of KES 142,000.

You can also use SUM() with GROUP BY to break totals down by category:

SELECT status, SUM(amount) AS revenue_by_status
FROM orders
GROUP BY status;

Result:

status	revenue_by_status
Completed	142000
Pending	4500
Cancelled	2800

Image: Data analysis and financial summaries are classic use cases for the SUM() function.

Use Cases for `SUM()`

Total sales revenue for a period
Total inventory value in a warehouse
Total hours logged by employees in a week
Total amount of discounts applied to orders

Common Beginner Mistake

SUM() only works with numeric columns. Attempting it on a text column will throw an error. Also, SUM() ignores NULL values automatically — which is usually the desired behavior, but good to know!

-- This will FAIL if 'product' is a text column
SELECT SUM(product) FROM orders; --  Error

-- This is CORRECT
SELECT SUM(amount) FROM orders; --  Works

Function 3 — `AVG()`

What It Does

The AVG() function calculates the arithmetic mean (average) of a set of numeric values. Rather than knowing the total, AVG() tells you the typical or central value — which is often more meaningful for decision-making.

Syntax

SELECT AVG(column_name)
FROM table_name
WHERE condition;

Real-World Example

Question: What is the average order amount across all orders?

SELECT AVG(amount) AS average_order_value
FROM orders;

Result:

average_order_value
29860.00

This tells the business that, on average, each order is worth approximately KES 29,860 — a useful figure for forecasting and setting minimum order targets.

Combining with GROUP BY:

SELECT status, AVG(amount) AS avg_amount_by_status
FROM orders
GROUP BY status;

Result:

status	avg_amount_by_status
Completed	47333.33
Pending	4500.00
Cancelled	2800.00

Use Cases for `AVG()`

Average salary across a department
Average student score in an exam
Average delivery time for shipments
Average customer rating for a product
Average page views per day on a website

Common Beginner Mistake

Like SUM(), AVG() ignores NULL values. This means if some rows in your column are NULL, the average is calculated only from the non-null rows — which may or may not be what you want.

-- If 5 rows exist but only 3 have values, AVG uses 3 rows
SELECT AVG(rating) FROM products; -- Only non-NULL ratings are averaged

If you need to include NULL as zero in your average, use COALESCE():

SELECT AVG(COALESCE(rating, 0)) FROM products;

Function 4 — `UPPER()` and `LOWER()`

What They Do

These are string functions that convert text to either all uppercase (UPPER()) or all lowercase (LOWER()). While they seem simple, they are critical for data cleaning, standardization, and consistent formatting across databases.

Syntax

-- Convert to uppercase
SELECT UPPER(column_name) FROM table_name;

-- Convert to lowercase
SELECT LOWER(column_name) FROM table_name;

Real-World Example

Consider a customers table where email addresses and names have been entered inconsistently:

customer_id	name	email
1	alice njeri	ALICE@GMAIL.COM
2	BRIAN OTIENO	brian@yahoo.com
3	Carol Wanjiku	CAROL@OUTLOOK.COM

Problem: The data is inconsistent — some names are lowercase, some uppercase, and emails are mixed.

Solution: Use LOWER() to standardize emails and UPPER() to format names properly.

-- Standardize all emails to lowercase
SELECT customer_id,
       UPPER(name) AS formatted_name,
       LOWER(email) AS standardized_email
FROM customers;

Result:

customer_id	formatted_name	standardized_email
1	ALICE NJERI	alice@gmail.com
2	BRIAN OTIENO	brian@yahoo.com
3	CAROL WANJIKU	carol@outlook.com

You can also use LOWER() in a WHERE clause for case-insensitive searching:

-- Find customer regardless of how their email was entered
SELECT * FROM customers
WHERE LOWER(email) = 'alice@gmail.com';

This ensures your search works whether the email is stored as ALICE@GMAIL.COM, Alice@Gmail.Com, or alice@gmail.com.

Image: Data cleaning and standardization — exactly what UPPER() and LOWER() help accomplish.

Use Cases for `UPPER()` and `LOWER()`

Standardizing email addresses before sending bulk emails
Normalizing user input for case-insensitive login authentication
Formatting customer names consistently for reports
Ensuring uniform data before comparing or joining tables
Cleaning imported CSV data that has inconsistent casing

Common Beginner Mistake

These functions do not modify the actual data in the database. They only transform the output in your query result. To permanently change the data, you need an UPDATE statement:

-- This only changes the display output, NOT the stored data:
SELECT LOWER(email) FROM customers; --  Display only

-- This permanently updates the stored data:
UPDATE customers
SET email = LOWER(email); --  Permanent change

Function 5 — `CONCAT()`

What It Does

The CONCAT() function joins two or more strings together into a single string. It is incredibly useful for combining columns, formatting output, building dynamic messages, and creating full names or addresses from separate fields.

Syntax

-- Standard CONCAT()
SELECT CONCAT(string1, string2, string3, ...) FROM table_name;

-- Alternative using || operator (works in PostgreSQL and SQLite)
SELECT string1 || string2 FROM table_name;

Real-World Example

Suppose your employees table stores first and last names in separate columns:

emp_id	first_name	last_name	department	city
1	Alice	Njeri	Engineering	Nairobi
2	Brian	Otieno	Marketing	Mombasa
3	Carol	Wanjiku	Finance	Nairobi

Question: Generate a full name and a formatted label for each employee.

SELECT
    emp_id,
    CONCAT(first_name, ' ', last_name) AS full_name,
    CONCAT(first_name, ' works in ', department, ' — ', city) AS employee_summary
FROM employees;

Result:

emp_id	full_name	employee_summary
1	Alice Njeri	Alice works in Engineering — Nairobi
2	Brian Otieno	Brian works in Marketing — Mombasa
3	Carol Wanjiku	Carol works in Finance — Nairobi

You can also combine CONCAT() with UPPER() for even more powerful string formatting:

SELECT
    CONCAT(UPPER(first_name), ' ', UPPER(last_name)) AS full_name_caps
FROM employees;

Result:

full_name_caps
ALICE NJERI
BRIAN OTIENO
CAROL WANJIKU

Use Cases for `CONCAT()`

Building full names from first_name and last_name columns
Creating a full address from street, city, and country fields
Generating personalized email greetings like "Dear Alice Njeri,"
Combining a product code with a category code to form a SKU
Building dynamic SQL strings within stored procedures

Common Beginner Mistake

CONCAT() treats NULL values differently depending on the database:

-- In MySQL, CONCAT() returns NULL if ANY argument is NULL:
SELECT CONCAT('Hello', NULL, 'World'); -- Returns: NULL 

-- Use COALESCE() or CONCAT_WS() to handle NULLs safely:
SELECT CONCAT('Hello', COALESCE(NULL, ''), 'World'); -- Returns: HelloWorld 

-- CONCAT_WS (Concatenate With Separator) skips NULLs automatically:
SELECT CONCAT_WS(' ', first_name, middle_name, last_name) FROM employees; --

Combining Functions Together

The true power of SQL functions emerges when you combine them in a single query. Here is an example that uses COUNT(), SUM(), AVG(), and CONCAT() together in one query to produce a rich report:

SELECT
    department,
    COUNT(*) AS total_employees,
    SUM(salary) AS total_salary_budget,
    ROUND(AVG(salary), 2) AS average_salary,
    CONCAT('Dept: ', UPPER(department)) AS department_label
FROM employees
GROUP BY department
ORDER BY total_salary_budget DESC;

This single query tells you — for each department:

How many employees exist
The total salary budget
The average salary
A nicely formatted department label

This is the kind of query you would write to build a management dashboard or a financial summary report.

Best Practices

1. Always Use Aliases (`AS`)

Use the AS keyword to give your function outputs meaningful names. It makes your results readable and your queries professional.

-- Hard to read:
SELECT COUNT(*), SUM(amount), AVG(amount) FROM orders;

-- Much better:
SELECT 
    COUNT(*) AS total_orders,
    SUM(amount) AS total_revenue,
    AVG(amount) AS avg_order_value
FROM orders;

2. Handle NULL Values Proactively

Use COALESCE() or IFNULL() to deal with NULL values before passing them to aggregate functions, especially when calculating averages or concatenating strings.

3. Use `GROUP BY` with Aggregate Functions

When using COUNT(), SUM(), or AVG() alongside non-aggregate columns, always pair them with a GROUP BY clause.

--  This will cause an error in most databases:
SELECT department, COUNT(*) FROM employees;

--  This is correct:
SELECT department, COUNT(*) FROM employees GROUP BY department;

4. Test on Small Datasets First

Before running aggregate queries on millions of rows, test on a limited subset using LIMIT:

SELECT * FROM orders LIMIT 10;

5. Comment Your Queries

Use -- for single-line comments and /* */ for multi-line comments to document your queries:

-- Get a revenue summary by order status
SELECT 
    status,
    COUNT(*) AS order_count,     -- Number of orders per status
    SUM(amount) AS total_amount  -- Total value per status
FROM orders
GROUP BY status;

Conclusion

SQL does not have to be intimidating. By mastering just five core functions — COUNT(), SUM(), AVG(), UPPER()/LOWER(), and CONCAT() — you already have the tools to:

Summarize and analyze data
Clean and standardize messy datasets
Build meaningful reports and dashboards
Answer real business questions with confidence

SQL IS NOT BAD - It's Running the World And Most Developers Don't Know Half of It....

Collins Njeru — Fri, 17 Apr 2026 12:44:00 +0000

"Every database, every app, every swipe of your card — SQL is there."

The language that outlived every trend since 1974

Written for developers, analysts, and data enthusiasts at every level

From zero to advanced — in one article

Last updated: 2026 | PostgreSQL-first, broadly compatible

"In an era obsessed with the newest JavaScript framework or the latest AI buzzword, SQL has done something remarkable , it has outlived every trend thrown at it since 1974."*

NoSQL was supposed to kill it.

Big Data was supposed to replace it.

Pandas, Spark, and cloud warehouses were going to make it irrelevant.

Yet here we are. SQL is baked into everything from PostgreSQL and MySQL to Snowflake, BigQuery, and DuckDB.

~~SQL is dying.~~ ← False. SQL is thriving.

If you work with data — in any capacity — SQL fluency is not optional. It's oxygen.

The Anatomy of a SQL Query
JOINs Demystified
Aggregation and GROUP BY
Window Functions — SQL's Hidden Superpower
Subqueries vs CTEs
Indexing — Why Your Queries Are Slow
Transactions and ACID
10 SQL Mistakes Hurting Your Performance
Advanced SQL Patterns
The Future of SQL

1. The Anatomy of a SQL Query

Before we run, let's walk. A SQL query has a very specific logical execution order that trips up even experienced developers.

The order you write a query:

SELECT name, COUNT(*) AS total_orders
FROM customers
JOIN orders ON customers.id = orders.customer_id
WHERE customers.country = 'Kenya'
GROUP BY name
HAVING COUNT(*) > 5
ORDER BY total_orders DESC
LIMIT 10;

The order the database executes it:

1. FROM       → Identify the tables
2. JOIN       → Combine related data
3. WHERE      → Filter rows (before grouping)
4. GROUP BY   → Collapse rows into groups
5. HAVING     → Filter groups (after aggregation)
6. SELECT     → Choose columns to return
7. ORDER BY   → Sort the result
8. LIMIT      → Cut result to N rows

Key insight: WHERE runs before GROUP BY. This means you cannot filter on an aggregated column using WHERE. That's what HAVING is for.

Nested tip: Get this order wrong, and you'll spend hours debugging a query that looks perfectly fine.

What You Should Know Before Writing Any Query

Always understand the shape of your data first
Never assume column types — check them
Use EXPLAIN before running on large tables
Limit your results while exploring

Your SQL Query Checklist

[x] Understand the execution order
[x] Know which table is the "driving" table
[x] Add a LIMIT when exploring
[ ] Run EXPLAIN ANALYZE to check the query plan
[ ] Check indexes on JOIN and WHERE columns
[ ] Peer-review complex queries before production

2. JOINs Demystified

JOINs are the heart of relational databases. If you understand them deeply, you understand SQL.

The Main JOIN Types

JOIN Type	What It Returns	Use Case
`INNER JOIN`	Rows matching in both tables	Most common join
`LEFT JOIN`	All left rows + matched right rows	Find unmatched records
`RIGHT JOIN`	All right rows + matched left rows	Rare — prefer LEFT JOIN
`FULL OUTER JOIN`	All rows from both tables	Complete data comparison
`CROSS JOIN`	Cartesian product of both tables	Combinations/permutations
`SELF JOIN`	A table joined to itself	Hierarchies, org charts

Visual Mental Model

Think of two overlapping circles:

INNER JOIN → only the overlap
LEFT JOIN → entire left circle + overlap
FULL OUTER JOIN → both circles entirely

A Common Real-World Example

-- Find customers who have NEVER placed an order
SELECT c.name, c.email
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.customer_id IS NULL;

The trick of using LEFT JOIN ... WHERE right_table.id IS NULL to find non-matching rows is one of the most elegant patterns in SQL. Learn it. Use it often.

The SELF JOIN — More Useful Than You Think

-- Find employees and their managers (from the same table)
SELECT 
    e.name AS employee,
    m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

3. Aggregation and GROUP BY

Aggregation is where SQL transforms raw rows into business intelligence.

Core Aggregate Functions

SELECT
    department,
    COUNT(*)                    AS total_staff,
    AVG(salary)                 AS avg_salary,
    MAX(salary)                 AS highest_salary,
    MIN(salary)                 AS lowest_salary,
    SUM(salary)                 AS total_payroll,
    COUNT(DISTINCT job_title)   AS unique_roles
FROM employees
GROUP BY department
ORDER BY total_payroll DESC;

The Golden Rule of GROUP BY

Every column in SELECT must either be in GROUP BY or be inside an aggregate function.

Wrong vs Right

--  WRONG — 'name' is neither grouped nor aggregated
SELECT department, name, COUNT(*)
FROM employees
GROUP BY department;

--  CORRECT
SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department;

Filtering Groups with HAVING

-- Only departments with more than 10 employees earning above $50,000
SELECT department, COUNT(*) AS headcount
FROM employees
WHERE salary > 50000          -- filters rows BEFORE grouping
GROUP BY department
HAVING COUNT(*) > 10          -- filters groups AFTER aggregation
ORDER BY headcount DESC;

4. Window Functions — SQL's Hidden Superpower

If there is one feature that separates SQL beginners from SQL professionals, it is window functions.

Window functions let you perform calculations across related rows — without collapsing the result like GROUP BY does.

The Syntax

function_name() OVER (
    PARTITION BY column     -- Optional: group rows into windows
    ORDER BY column         -- Optional: define order within the window
    ROWS BETWEEN ...        -- Optional: define the window frame
)

Ranking Functions

SELECT
    name,
    department,
    salary,
    RANK()        OVER (PARTITION BY department ORDER BY salary DESC) AS rank,
    DENSE_RANK()  OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank,
    ROW_NUMBER()  OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;

Difference Between Ranking Functions

Function	Handles Ties	Example Output
`ROW_NUMBER()`	Always unique	1, 2, 3, 4
`RANK()`	Skips after tie	1, 1, 3, 4
`DENSE_RANK()`	No skip after tie	1, 1, 2, 3

Running Totals and Moving Averages

-- Running total of sales with a 7-day moving average
SELECT
    sale_date,
    amount,
    SUM(amount) OVER (ORDER BY sale_date) AS running_total,
    AVG(amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_7day_avg
FROM sales;

LAG and LEAD — Comparing to Neighboring Rows

-- Month-over-month revenue growth
SELECT
    month,
    revenue,
    LAG(revenue, 1)  OVER (ORDER BY month) AS prev_month,
    LEAD(revenue, 1) OVER (ORDER BY month) AS next_month,
    revenue - LAG(revenue, 1) OVER (ORDER BY month) AS growth
FROM monthly_revenue;

Window functions are the reason SQL is still king for analytics. No other tool lets you express this kind of computation as cleanly.

5. Subqueries vs CTEs

The Subquery — Powerful but Messy at Scale

A query nested inside another query:

SELECT name, total_spent
FROM (
    SELECT customer_id, SUM(amount) AS total_spent
    FROM orders
    GROUP BY customer_id
) AS customer_totals
JOIN customers ON customers.id = customer_totals.customer_id
WHERE total_spent > (
    SELECT AVG(total_spent) FROM (
        SELECT SUM(amount) AS total_spent
        FROM orders
        GROUP BY customer_id
    ) AS sub
);

This works — but grows unreadable fast.

The CTE (Common Table Expression) — Clean, Named, Reusable

CTEs let you name intermediate results using the WITH keyword:

WITH customer_totals AS (
    SELECT customer_id, SUM(amount) AS total_spent
    FROM orders
    GROUP BY customer_id
),
avg_spend AS (
    SELECT AVG(total_spent) AS avg_val FROM customer_totals
)
SELECT 
    c.name, 
    ct.total_spent
FROM customers c
JOIN customer_totals ct ON c.id = ct.customer_id
CROSS JOIN avg_spend
WHERE ct.total_spent > avg_spend.avg_val
ORDER BY ct.total_spent DESC;

Prefer CTEs over nested subqueries for anything beyond one level. Your teammates — and your future self — will thank you.

Recursive CTEs — Conquering Hierarchical Data

-- Traverse an employee org chart tree
WITH RECURSIVE org_chart AS (
    -- Base case: the CEO (no manager)
    SELECT id, name, manager_id, 0 AS depth
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive case: employees joined to their managers
    SELECT e.id, e.name, e.manager_id, oc.depth + 1
    FROM employees e
    JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT depth, name FROM org_chart ORDER BY depth, name;

When to Use Each

Subquery — one-off, simple, single-level nesting
CTE — multi-step logic, readable complex queries
Recursive CTE — trees, graphs, hierarchies

6. Indexing — Why Your Queries Are Slow

No SQL article is complete without talking about performance. And no performance topic matters more than indexing.

What Is an Index?

An index is a separate data structure (usually a B-tree) that lets the database find rows without scanning every record. Think of it as the index in the back of a textbook.

The Speed Difference Is Staggering

-- Without index — scanning 50 million rows
SELECT * FROM customers WHERE email = 'alice@example.com';
-- ⏱ Time: 8.4 seconds

-- Create the index
CREATE INDEX idx_customers_email ON customers(email);

-- With index — direct lookup
SELECT * FROM customers WHERE email = 'alice@example.com';
-- ⏱ Time: 0.003 seconds

When to Create an Index

Columns used frequently in WHERE clauses
Columns used in JOIN conditions
Columns used in ORDER BY or GROUP BY on large tables
Foreign key columns

When Indexes Hurt

Columns with very low cardinality (e.g., is_active = TRUE/FALSE only)
Tables written to far more often than they're read
Too many indexes slow down INSERT, UPDATE, and DELETE

EXPLAIN — Your Best Debugging Friend

EXPLAIN ANALYZE
SELECT c.name, COUNT(o.id) AS order_count
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE c.country = 'Kenya'
GROUP BY c.name;

Use EXPLAIN ANALYZE to see whether the database does a full table scan (Seq Scan) or uses an index (Index Scan). Never optimize blindly.

7. Transactions and ACID

Every time you move money, book a seat, or submit an order — you're trusting transactions to make sure nothing breaks mid-operation.

The ACID Properties

Property	Meaning	Real-World Guarantee
Atomicity	All-or-nothing	No partial transfers
Consistency	Valid state → valid state	Rules always enforced
Isolation	Concurrent transactions don't interfere	No dirty reads
Durability	Committed data survives crashes	Your data is safe

A Bank Transfer — The Classic Example

BEGIN TRANSACTION;

-- Step 1: Deduct from sender
UPDATE accounts SET balance = balance - 1000 WHERE user_id = 'alice';

-- Step 2: Add to receiver
UPDATE accounts SET balance = balance + 1000 WHERE user_id = 'bob';

-- If both succeed:
COMMIT;

-- If anything fails:
-- ROLLBACK;

Without transactions: a server crash between Step 1 and Step 2 means Alice loses money that Bob never receives.

With transactions: either both steps happen, or neither does.

8. 10 SQL Mistakes Hurting Your Performance

SELECT * — Never in Production

Always name the columns you need. SELECT * fetches unnecessary data, breaks when schemas change, and prevents index-only scans.

The Top 10 Offenders

SELECT * in production code — fetch only what you need
Not filtering before joining — WHERE early, not after millions of rows are joined
Functions on indexed columns in WHERE

   --  Kills the index
   WHERE YEAR(created_at) = 2024

   --  Preserves the index
   WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01'

HAVING for non-aggregated filters — use WHERE before grouping
OR on indexed columns — can prevent index use; try UNION ALL
NOT IN with NULLs — returns no rows if subquery has a NULL; use NOT EXISTS
Missing indexes on foreign keys — always index FK columns used in JOINs
N+1 queries — never query inside a loop; use JOINs or batch queries
Implicit type conversions — comparing VARCHAR to integers forces a cast on every row
Forgetting LIMIT while exploring — 3 rows in dev can be 30 million in prod

9. Advanced SQL Patterns

UPSERT — Insert or Update in One Statement

-- PostgreSQL syntax
INSERT INTO users (id, email, login_count)
VALUES (42, 'alice@example.com', 1)
ON CONFLICT (id) DO UPDATE
SET login_count = users.login_count + 1,
    last_seen   = NOW();

Pivot with CASE WHEN

-- Turn rows into columns
SELECT
    product,
    SUM(CASE WHEN quarter = 'Q1' THEN revenue ELSE 0 END) AS Q1,
    SUM(CASE WHEN quarter = 'Q2' THEN revenue ELSE 0 END) AS Q2,
    SUM(CASE WHEN quarter = 'Q3' THEN revenue ELSE 0 END) AS Q3,
    SUM(CASE WHEN quarter = 'Q4' THEN revenue ELSE 0 END) AS Q4
FROM sales
GROUP BY product;

Find Duplicates

SELECT email, COUNT(*) AS occurrences
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY occurrences DESC;

Gap Detection — Finding Missing IDs

SELECT id + 1 AS gap_start
FROM orders o
WHERE NOT EXISTS (
    SELECT 1 FROM orders WHERE id = o.id + 1
)
AND id < (SELECT MAX(id) FROM orders);

Percentile Calculation

SELECT
    department,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary,
    PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY salary) AS p90_salary
FROM employees
GROUP BY department;

10. The Future of SQL

Far from fading, SQL is evolving aggressively:

DuckDB — in-process analytics SQL that runs on your laptop with serious performance
Snowflake / BigQuery / Redshift — cloud data warehouses where SQL scales to petabytes
dbt (data build tool) — SQL as software engineering, with version control and testing
SQL on JSON — modern databases now query deeply nested JSON natively
AI-assisted SQL — LLMs generate SQL, but knowing it deeply makes you 10x better at validating that output

SQL will be the last programming language to die.

And it won't.

Your SQL Learning Roadmap

Beginner

[ ] SELECT, FROM, WHERE basics
[ ] INNER JOIN and LEFT JOIN
[ ] GROUP BY and aggregate functions
[ ] ORDER BY and LIMIT

Intermediate

[ ] All JOIN types
[ ] Subqueries and CTEs
[ ] HAVING vs WHERE
[ ] NULL handling

Advanced

[ ] Window functions (RANK, LAG, LEAD, running totals)
[ ] Recursive CTEs
[ ] Indexing strategy and EXPLAIN
[ ] Transactions and isolation levels
[ ] Query optimization techniques

Conclusion

SQL is not just a tool. It's a way of thinking about data — relationally, declaratively, and powerfully.

The developers and analysts who master SQL don't just write better queries. They:

Think more clearly about data problems
Debug faster
Communicate better across engineering, product, and business teams

The language is 50+ years old and still deeply relevant. That is not an accident — it's proof.

Start writing better SQL today. The data is waiting.

TAMING DATA CHAOS IN POWER BI: A Guide to Joins, Relationships, and Schemas

Collins Njeru — Sun, 29 Mar 2026 19:07:40 +0000

Data modeling is the backbone of effective analytics in Power BI. It defines how tables connect, interact, and provide meaningful insights. Without a proper model, even the most advanced visuals can mislead. This article explores SQL joins, Power BI relationships, schemas, and common modeling practices using a customer dataset as an example.

What is Data Modeling?

Data modeling is the process of structuring data to represent real-world entities and their relationships. In Power BI, this involves:

Tables: Fact tables (transactions, metrics) and Dimension tables (descriptive attributes).
Relationships: Logical connections between tables.
Schemas: The overall design of how tables are organized.

A well-designed model ensures that filters, measures, and visuals behave as expected. Poor modeling often leads to incorrect totals, duplicated counts, or slow performance.

Example Dataset

We’ll use two simple tables:

Customers: CustomerID, Name, Region
Orders: OrderID, CustomerID, OrderDate, Amount

This dataset is small, but it illustrates the principles that scale to enterprise-level models.

SQL Joins Explained

Joins combine data from multiple tables based on a common key. In Power BI, joins are performed in Power Query using Merge Queries.

1. INNER JOIN

Definition: Returns rows with matching keys in both tables.
Example: Customers who placed orders.
Diagram (ASCII):
Use Case: Useful when analyzing only active customers.

2. LEFT JOIN

Definition: Returns all rows from the left table and matching rows from the right.
Example: All customers, with orders if they exist.
Use Case: Identify customers who have not placed orders.

3. RIGHT JOIN

Definition: Returns all rows from the right table and matching rows from the left.
Example: All orders, with customer details if available.
Use Case: Ensures no order is excluded even if customer data is missing.

4. FULL OUTER JOIN

Definition: Returns all rows when there is a match in either table.
Example: All customers and all orders, matched where possible.
Use Case: Data reconciliation across systems.

5. LEFT ANTI JOIN

Definition: Returns rows from the left table that have no match in the right.
Example: Customers who never placed an order.
Use Case: Marketing campaigns targeting inactive customers.

6. RIGHT ANTI JOIN

Definition: Returns rows from the right table that have no match in the left.
Example: Orders without a customer record.
Use Case: Detecting data quality issues.

Power BI Relationships

Relationships define how tables interact in the Model View.

Types of Relationships

One-to-Many (1:M): One customer → many orders. Most common.
Many-to-Many (M:M): Both sides can have multiple matches. Requires bridge tables.
One-to-One (1:1): Rare. One employee → one profile.

Cardinality

Cardinality defines the uniqueness of values in a relationship. For example, CustomerID is unique in Customers but repeats in Orders.

Active vs Inactive Relationships

Active: Default relationship used in visuals.
Inactive: Can be activated using DAX functions like:

CALCULATE(
    SUM(Orders[Amount]),
    USERELATIONSHIP(Customers[CustomerID], Orders[CustomerID])
)

Cross-Filter Direction

Single: Filters flow one way (e.g., Customers → Orders).
Both: Filters flow both ways, useful for complex models but can cause ambiguity.

Joins vs Relationships

Joins: Combine data during query (Power Query).
Relationships: Define logical connections in the data model (Model View).

Think of joins as data preparation and relationships as data modeling. Both are essential, but they serve different purposes.

Fact vs Dimension Tables

Fact Tables: Contain metrics (sales, revenue). Example: Orders.
Dimension Tables: Contain descriptive attributes (customer, product). Example: Customers.

Separating facts and dimensions improves clarity and performance. Facts answer “what happened?” while dimensions answer “who, what, when, where?.

Schemas in Power BI

1. Star Schema

Structure: Central fact table connected to dimension tables.
Use Case: Best practice for performance and clarity.
Example: Orders linked to Customers, Products, Dates**

2. Snowflake Schema

Structure: Dimensions normalized into multiple related tables.
Use Case: When dimensions have hierarchical attributes.
Example: Customers linked to Regions and Countries.

3. Flat Table (DLAT)

Structure: All data in one table.
Use Case: Quick prototypes, but poor for scalability.

Role-Playing Dimensions

Sometimes the same dimension is used multiple times. Example:

Date Dimension: Used for Order Date, Ship Date, and Delivery Date.
Solution: Duplicate the dimension table and rename accordingly:,Date_Order,Date_Ship,Date_Delivery

This avoids ambiguity and allows precise filtering.

Common Modeling Issues

Ambiguous relationships: Multiple paths between tables can confuse filters.
Circular references: Loops in relationships cause errors.
Performance bottlenecks: Using flat tables or M:M relationships excessively.
Inactive filters: Forgetting to activate relationships in DAX.

Step-by-Step in Power BI

Load Data: Import Customers and Orders.
Power Query Joins: Use Merge Queries for SQL-style joins.
Model View: Define relationships (CustomerID → CustomerID).
Schema Design: Organize into star or snowflake schemas.
Validate: Build visuals (e.g., total sales by region) to confirm filters work.

Hands-On DAX Examples

Total Sales by Region

Total Sales = SUM(Orders[Amount])

Customers Without Orders

Inactive Customers =
CALCULATETABLE(
Customers,
NOT(RELATEDTABLE(Orders))

Sales by Order Date vs Ship Date

Sales by Ship Date =
CALCULATE(
SUM(Orders[Amount]),
USERELATIONSHIP(Orders[ShipDate], Date[Date])

Real-Life Use Cases

Retail: Identify customers who haven’t purchased recently (LEFT ANTI JOIN).
Finance: Reconcile transactions across systems (FULL OUTER JOIN).
Logistics: Track shipments using role-playing Date dimensions.
Marketing: Segment customers by region and purchase behavior.

Conclusion

Data modeling in Power BI is about clarity, efficiency, and accuracy. By mastering joins, relationships, schemas, and best practices, you ensure that your dashboards tell the right story. Whether you’re building a star schema or handling role-playing dimensions, thoughtful modeling is the key to reliable insights.

With a clean model, you can confidently answer questions like:

Which regions have the highest sales?
Which customers are inactive?
How do shipping delays affect revenue?

Power BI provides the tools,your job is to design the model that makes the data speak.

LINUX AS THE NERVOUS SYSTEM OF DATA ENGINEERING

Collins Njeru — Sat, 28 Mar 2026 21:26:57 +0000

Data engineering is the backbone of modern data-driven organizations. It enables the collection, transformation, and delivery of data at scale. While tools like Apache Spark, Hadoop, and Kafka are essential, the operating system powering these tools is equally critical.

Linux has emerged as the preferred OS due to its stability, scalability, flexibility, and open-source nature. This article explores Linux’s role in real-world data engineering, including essential skills, workflow management, tool integration, cloud deployment, and practical examples.

WHY LINUX DOMINATES DATA ENINEERING

Linux has become the de facto standard for data engineers due to several key advantages:

Open-Source Flexibility

Fully customizable for specific workloads
Kernel can be optimized for performance
Lightweight distributions work well for containerized workflows

Stability and Uptime

Runs continuously with minimal downtime
Ideal for mission-critical production pipelines

Cost-Effectiveness

Free to use, reducing infrastructure costs
Scales easily without expensive licenses

Community Support

Extensive documentation, forums, and troubleshooting resources
Large community of contributors and developers

CORE LINUX SKILLS FOR DATA ENGINEERS

1. FILE SYSTEM NAVIGATION

# List files
ls

# Change directory
cd /path/to/directory

# Show current working directory
pwd

# Find files
find /path/to/search -name "dataset.csv"

2. PROCESS MANAGEMENT

# Show all running processes
ps aux

# Monitor system resource usage
top

# Kill a specific process
kill <pid>

3. SHELL SCRIPTING

#!/bin/bash
# Download and process data
wget http://example.com/dataset.csv
python process_data.py dataset.csv

4.Permission & Ownership

# Change file permissions
chmod 755 my_file.txt

# Change file ownership
chown user:group my_file.txt

5.Package Management

# Install a package on Debian-based systems
sudo apt install package-name

# Update all packages
sudo apt update && sudo apt upgrade

lINUX IN DATA PIPELINES

1. Scheduling Tasks with Cron

# Edit cron jobs
crontab -e

# Schedule a pipeline to run every hour
0 * * * * /home/user/data_pipeline.sh

2. Automating ETL with Shell Scripts

#!/bin/bash
# Download data
wget http://example.com/data.csv

# Transform data
awk -F, '{print $1, $2, $3}' data.csv > transformed_data.csv

# Load into PostgreSQL
psql -U user -d dbname -c "\copy my_table FROM transformed_data.csv WITH CSV"

3. Logging Pipeline Output

#!/bin/bash
echo "$(date): Pipeline started" >> /var/log/data_pipeline.log
python etl_script.py >> /var/log/data_pipeline.log 2>&1
echo "$(date): Pipeline finished" >> /var/log/data_pipeline.log

INTEGRATION WITH DATA ENGINEERING TOOLS

1.Apache Hadoop

# Execute a Hadoop job
hadoop jar /usr/local/hadoop/hadoop-examples.jar wordcount /input /output

2.Apache Kafka

# Start Zookeeper
bin/zookeeper-server-start.sh config/zookeeper.properties

# Start Kafka broker
bin/kafka-server-start.sh config/server.properties

# Produce messages
bin/kafka-console-producer.sh --topic my_topic --bootstrap-server localhost:9092

# Consume messages
bin/kafka-console-consumer.sh --topic my_topic --from-beginning --bootstrap-server localhost:9092

3.Apache Spark

# Submit a Spark job
spark-submit --master local[4] etl_spark_job.py

4.Docker & Kubernetes

# Build Docker image
docker build -t mydataengineerimage .

# Run Docker container
docker run -d --name data_pipeline_container mydataengineerimage

# Deploy Kubernetes resources
kubectl apply -f data_pipeline_deployment.yaml

LINUX IN CLOUD AND BIG DATA ENVIRONMENTS

1.Cloud Servers and Virtual Machines

# Launch Ubuntu VM on AWS
aws ec2 run-instances \
    --image-id ami-0abcdef1234567890 \
    --count 1 \
    --instance-type t2.medium \
    --key-name MyKeyPair \
    --security-group-ids sg-0123456789abcdef0 \
    --subnet-id subnet-6e7f829e

2.Monitoring System Resources

# CPU usage
top

# Memory usage
free -h

# Disk usage
df -h

3.Debugging and Troubleshooting

Cheking logs

# View system logs
tail -f /var/log/syslog

# View pipeline logs
tail -f /var/log/data_pipeline.log

Killing Stuck Processes

# Find process ID
ps aux | grep etl_script.py

# Kill process
kill -9 <pid>

CHALLENGES OF USING LINUX IN DATA ENGINEERING

Steep learning Curve : Command-line usage can be intimidating for beginners
Debugging Complexity : Requires familiarity with logs,permisssions and processes
Automation Dependency : Heavy reliance on scripts and CLI tools

CONCLUSION

Linux is essential for real-world data engineering. It provides the foundation for stable, scalable, and efficient data pipelines. By mastering Linux skills, data engineers can:

Build robust ETL pipelines
Integrate seamlessly with Hadoop,spark and Kafka
Deploy applications in cloud and containerized environments
Monitor and Troubleshoot complex workflows

IN today's data-driven world,linux is more than an operating system,it is a critical enabler of modern data engineering

Should you join Data Engineering?A guide to the tools you'll use

Collins Njeru — Mon, 16 Mar 2026 12:11:50 +0000

Introduction

Many aspiring technologists find themselves at a crossroad:is data engineering the right career path for me.The hesitation often comes from uncertainty about the tools and technologies involved. This article breaks down the core categories of data engineering tools, giving you a clear picture of what you’ll be working with if you decide to join the field.

Core categories of data engineering tools

1.Data ingestion & Integration

Data engineering starts with collecting information from multiple sources

Fivetran /Stitch/ Hevo Data : Automate extraction from SaaS apps and databases

Apache Kafka : Real-time streaming and event-driven pipelines.

Apache Nifi : Flow-based ingestion and routing.

2.Data storage & Warehousing

Once data is ingested, it needs a reliable home.

Snowflake:Cloud-native warehouse with scalability.

Google BigQuery:Serverless, highly scalable analytics warehouse.

Amazon Redshift :AWS-based warehouse optimized for queries.

3.Data processing & transformation

Raw data must be cleaned and transformed before use.

Apache spark:Distributed computing for batch and streaming.

Hadoop:Large-scale storage and batch processing.

Dbt (Data Build Tool):SQL-based transformations for analytics teams.

4. Workflow & orchestration

Pipelines need automation and scheduling.

Apache Airflow:Workflow automation and DAG scheduling.

Prefect/luigi :Alternatives for managing complex workflows.

5.Infrastructure & Deployment

Behind the scenes, infrastructure ensures scalability.

Docker & Kubernetes:Containerization and orchestration.

Terraform:Infrastructure as Code for cloud resources.

6.Monitoring & Quality

Data must be trustworthy and pipelines reliable.

Great expectations :Data validation and quality checks.

Datadog / Prometheus :Monitoring pipelines and infrastructure

Key Considerations

Scalability: Spark and Snowflake excel with large datasets.

Real-Time vs Batch: Kafka is unmatched for streaming; Hadoop and Spark dominate batch workloads.
Cloud Integration: Align tools with your provider (AWS Redshift, GCP BigQuery, Azure Synapse ).
Cost:Open-source tools are free but require setup; managed services reduce overhead but add licensing costs.

Conclusion

Joining data engineering means stepping into a field where you’ll design the backbone of modern businesses. The tools may seem overwhelming at first, but each one solves a specific problem together, they form a powerful toolkit. If you’re excited about building systems that move, store, and transform data at scale, then data engineering isn’t just a career option; it’s a future-proof calling.

DEV Community: Collins Njeru

THE ENGINEERING OF ELECTRICAL VERTICAL TAKEOFF: A deep technical study

Table of Contents

1. The Physics of Hover

1.1 Disc Loading

1.2 Induced Velocity

1.3 Hover Power and Figure of Merit

1.4 Python: Hover Power Solver

2. Three Architectures, One Contradiction

2.1 Architecture Schematics (ASCII Diagrams)

2.2 The Fundamental Trade-off

3. Rotor Aerodynamics — The Blade-Level Story

3.1 Figure of Merit

3.2 Tip Mach Number — The Hard Ceiling

3.3 Blade Geometry Parameters

3.4 Blade Element Momentum Theory (BEMT)

3.5 Python: Simplified BEMT Hover Analysis

4. The Battery Constraint and Electric Range

4.1 The Electric Breguet Equation

4.2 The Specific Energy Wall

4.3 Python: Electric Range Calculator with Sensitivity

5. Flight Control — Why Linear Theory Breaks Down

5.1 The Transition Problem

5.2 Incremental Nonlinear Dynamic Inversion (INDI)

5.3 Python: INDI Roll Channel

6. Acoustic Engineering — The Whisper Problem

6.1 The Regulatory Target and Why It Is Hard

6.2 The Fifth-Power Law

6.3 Blade Passage Frequency

6.4 Python: Acoustic Estimator

6.5 Blade Geometry for Noise Reduction

7. Structural Philosophy — Why Carbon Is Not Enough

7.1 The Multi-Objective Problem

7.2 Classical Laminate Theory (CLT)

7.3 Python: ABD Stiffness Matrix Builder

7.4 Structural Batteries — The Frontier

8. The Cascade — How Everything Connects

Summary — Key Equations Reference

From Chaos to Clarity: How SQL Transforms Raw Data into Powerful Stories

The Database We Are Working With

Start Simple, Stay Curious

JOINs — Where the Real Magic Happens

Subqueries — Answering Questions Inside Questions

CTEs — Writing SQL You Can Actually Read Later

Window Functions — The Thing That Changes How You Think About SQL

CASE WHEN — Making Your Data Tell a Story

The Production Stuff Nobody Talks About Enough

On Dirty Data — It Will Humble You

Three Bugs Worth Learning From

Where to Go From Here

5 SQL FUNCTIONS EVERY BEGINNER SHOULD KNOW

Table of Contents

Introduction

What Is SQL?

Function 1 — COUNT()

What It Does

Syntax

Real-World Example

Use Cases for COUNT()

Common Beginner Mistake

Function 2 — SUM()

What It Does

Syntax

Real-World Example

Use Cases for SUM()

Common Beginner Mistake

Function 3 — AVG()

What It Does

Syntax

Real-World Example

Use Cases for AVG()

Common Beginner Mistake

Function 4 — UPPER() and LOWER()

What They Do

Syntax

Real-World Example

Use Cases for UPPER() and LOWER()

Common Beginner Mistake

Function 5 — CONCAT()

What It Does

Function 1 — `COUNT()`

Use Cases for `COUNT()`

Function 2 — `SUM()`

Use Cases for `SUM()`

Function 3 — `AVG()`

Use Cases for `AVG()`

Function 4 — `UPPER()` and `LOWER()`

Use Cases for `UPPER()` and `LOWER()`

Function 5 — `CONCAT()`

Use Cases for `CONCAT()`

1. Always Use Aliases (`AS`)

3. Use `GROUP BY` with Aggregate Functions