Atlas Whoff

Posted on Apr 7

Writing SQL That Doesn't Come Back to Haunt You

#sql #database #backend #tutorial

SQL Is Not Self-Documenting

A year from now, you won't remember why you wrote that LEFT JOIN with five conditions. Your colleagues definitely won't.

Good SQL is readable SQL.

Format Like You Mean It

-- Bad: good luck reading this at 9 AM
SELECT u.id,u.email,COUNT(o.id) as order_count,SUM(o.total) as lifetime_value FROM users u LEFT JOIN orders o ON o.user_id=u.id WHERE u.created_at>'2024-01-01' AND u.status='active' GROUP BY u.id,u.email HAVING COUNT(o.id)>0 ORDER BY lifetime_value DESC LIMIT 100;

-- Good: obvious structure
SELECT
  u.id,
  u.email,
  COUNT(o.id)  AS order_count,
  SUM(o.total) AS lifetime_value
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE
  u.created_at > '2024-01-01'
  AND u.status = 'active'
GROUP BY u.id, u.email
HAVING COUNT(o.id) > 0
ORDER BY lifetime_value DESC
LIMIT 100;

CTEs Over Subqueries

-- Hard to follow: nested subqueries
SELECT email, total_revenue
FROM (
  SELECT user_id, SUM(amount) AS total_revenue
  FROM payments
  WHERE status = 'succeeded'
  GROUP BY user_id
  HAVING SUM(amount) > 1000
) high_value_payers
JOIN users ON users.id = high_value_payers.user_id
ORDER BY total_revenue DESC;

-- Easy to follow: CTEs tell a story
WITH high_value_payers AS (
  SELECT
    user_id,
    SUM(amount) AS total_revenue
  FROM payments
  WHERE status = 'succeeded'
  GROUP BY user_id
  HAVING SUM(amount) > 1000
)
SELECT
  u.email,
  hvp.total_revenue
FROM high_value_payers hvp
JOIN users u ON u.id = hvp.user_id
ORDER BY hvp.total_revenue DESC;

CTEs have names. Names communicate intent.

Window Functions Instead of Self-Joins

-- Painful: self-join to get running total
SELECT
  o1.id,
  o1.created_at,
  o1.total,
  SUM(o2.total) AS running_total
FROM orders o1
JOIN orders o2 ON o2.created_at <= o1.created_at AND o2.user_id = o1.user_id
GROUP BY o1.id, o1.created_at, o1.total;

-- Clean: window function
SELECT
  id,
  created_at,
  total,
  SUM(total) OVER (
    PARTITION BY user_id
    ORDER BY created_at
    ROWS UNBOUNDED PRECEDING
  ) AS running_total
FROM orders;

Common window functions:

-- Row number within group
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at)

-- Rank (with gaps for ties)
RANK() OVER (ORDER BY total_revenue DESC)

-- Lead/lag: access adjacent rows
LAG(amount, 1)  OVER (PARTITION BY user_id ORDER BY created_at) AS prev_amount
LEAD(amount, 1) OVER (PARTITION BY user_id ORDER BY created_at) AS next_amount

-- Running percentile
PERCENT_RANK() OVER (ORDER BY score)

Avoid SELECT *

-- Never in production code:
SELECT * FROM users;

-- Be explicit:
SELECT id, email, name, created_at FROM users;

Reasons:

Schema changes break your application silently
Fetches columns you don't need (slower, more memory)
Ambiguous in JOINs (which table's id?)

NULLs Are Tricky

-- This returns no rows even for NULLs!
SELECT * FROM users WHERE deleted_at != '2024-01-01';

-- NULL comparisons need IS NULL / IS NOT NULL
SELECT * FROM users WHERE deleted_at IS NULL OR deleted_at != '2024-01-01';

-- COALESCE: use default when NULL
SELECT COALESCE(display_name, email) AS shown_name FROM users;

-- NULLIF: convert a value to NULL
SELECT NULLIF(status, '') AS status FROM users; -- empty string → NULL

UPSERT Pattern

-- PostgreSQL: INSERT ... ON CONFLICT
INSERT INTO user_preferences (user_id, key, value)
VALUES ('user-123', 'theme', 'dark')
ON CONFLICT (user_id, key)
DO UPDATE SET
  value = EXCLUDED.value,
  updated_at = NOW();

-- Only update if actually changed
INSERT INTO metrics (date, event, count)
VALUES (CURRENT_DATE, 'pageview', 1)
ON CONFLICT (date, event)
DO UPDATE SET count = metrics.count + EXCLUDED.count;

Date Operations

-- Last 30 days
WHERE created_at >= NOW() - INTERVAL '30 days'

-- Truncate to day/week/month
SELECT
  DATE_TRUNC('week', created_at) AS week,
  COUNT(*) AS signups
FROM users
GROUP BY DATE_TRUNC('week', created_at)
ORDER BY week;

-- Age in years
SELECT EXTRACT(YEAR FROM AGE(birth_date)) AS age FROM users;

-- Format for display
SELECT TO_CHAR(created_at, 'YYYY-MM-DD') AS formatted_date FROM orders;

The Index on Every Query Rule

For every query that goes to production:

Run EXPLAIN ANALYZE
Verify no Seq Scan on large tables
Add index if needed

This takes two minutes. The alternative is debugging a slow query in production at 2 AM.

SQL patterns, migration utilities, and query optimization built in: Whoff Agents AI SaaS Starter Kit.

DEV Community