99Tools

Posted on May 19

Stop Writing Ugly SQL: A Developer's Guide to Clean, Readable Queries

#sql #database #webdev #productivity

We've all been there. You open a PR, and the SQL query looks like someone sneezed on the keyboard:

select u.id,u.name,u.email,o.total,o.created_at from users u join orders o on u.id=o.user_id where o.status='completed' and o.created_at >= '2024-01-01' order by o.created_at desc limit 50

It works. It returns the right data. But it's a nightmare to review, debug, or extend.

SQL readability is not a cosmetic concern — it's an engineering discipline. Messy SQL leads to logic errors, missed optimizations, and silent bugs that survive code review because nobody wants to untangle the mess.

This guide walks through the conventions, habits, and formatting rules that make SQL a joy to work with.

Why SQL Formatting Matters More Than You Think

Unlike application code, SQL queries often live in multiple places simultaneously — inside ORM raw query methods, migration files, stored procedures, BI tools, and debug logs. A single ugly query might be read by:

A backend engineer debugging a slow endpoint
A DBA optimizing an index
A data analyst trying to understand business logic
You, six months from now, at 11pm before a deploy

Consistent formatting makes all of these scenarios less painful.

The Core Rules

1. Capitalize SQL keywords

SQL keywords should be uppercase. Column names, table names, and aliases should be lowercase (or match your schema's convention).

Bad:

select id, name from users where active = true

Good:

SELECT id, name
FROM users
WHERE active = true

This instantly distinguishes SQL structure from your data model.

2. One clause per line

Every major clause — SELECT, FROM, JOIN, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT — gets its own line. No exceptions.

Bad:

SELECT u.id, u.name, COUNT(o.id) AS order_count FROM users u LEFT JOIN orders o ON u.id = o.user_id WHERE u.created_at >= '2023-01-01' GROUP BY u.id, u.name HAVING COUNT(o.id) > 5 ORDER BY order_count DESC

Good:

SELECT
    u.id,
    u.name,
    COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at >= '2023-01-01'
GROUP BY
    u.id,
    u.name
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC

The second version can be scanned in seconds. The first requires active decoding.

3. Align your columns in the SELECT list

Put each selected column on its own line, indented consistently (2 or 4 spaces — pick one and stick to it):

SELECT
    account_id,
    account_name,
    region,
    plan_type,
    created_at
FROM accounts
WHERE status = 'active'

This makes it trivial to add, remove, or reorder columns during development.

4. Qualify every column name in joins

Once you have more than one table in a query, always prefix columns with their table alias. Never leave a reader guessing which table id or created_at belongs to.

Ambiguous:

SELECT id, name, total, created_at
FROM users
JOIN orders ON users.id = orders.user_id

Clear:

SELECT
    u.id,
    u.name,
    o.total,
    o.created_at
FROM users u
JOIN orders o ON u.id = o.user_id

This also protects you when schema changes add a new column with a conflicting name — your query won't silently start returning the wrong data.

5. Write meaningful aliases

Single-letter aliases like a, b, x are lazy and harmful. Use short but meaningful abbreviations.

-- Bad
SELECT a.name, b.total
FROM accounts a
JOIN invoices b ON a.id = b.account_id

-- Good
SELECT acc.name, inv.total
FROM accounts acc
JOIN invoices inv ON acc.id = inv.account_id

6. Put JOIN conditions on a separate, indented line (for complex joins)

When join conditions are multi-part or involve expressions, break them out clearly:

SELECT
    p.id,
    p.title,
    c.body AS latest_comment
FROM posts p
LEFT JOIN comments c
    ON c.post_id = p.id
    AND c.created_at = (
        SELECT MAX(created_at)
        FROM comments
        WHERE post_id = p.id
    )
WHERE p.published = true

7. Use CTEs to break up complexity

Common Table Expressions (CTEs) are one of the most underused tools in SQL. Instead of nesting subqueries three levels deep, break the logic into named, readable steps.

Subquery hell:

SELECT avg_revenue, cohort
FROM (
    SELECT cohort, AVG(revenue) AS avg_revenue
    FROM (
        SELECT
            DATE_TRUNC('month', first_purchase_date) AS cohort,
            SUM(order_total) AS revenue
        FROM (
            SELECT u.id, MIN(o.created_at) AS first_purchase_date, o.total AS order_total
            FROM users u
            JOIN orders o ON u.id = o.user_id
            GROUP BY u.id, o.total
        ) user_orders
        GROUP BY cohort
    ) cohort_revenue
    GROUP BY cohort
) final
ORDER BY cohort

CTE version:

WITH user_first_purchase AS (
    SELECT
        u.id AS user_id,
        MIN(o.created_at) AS first_purchase_date
    FROM users u
    JOIN orders o ON u.id = o.user_id
    GROUP BY u.id
),

cohort_revenue AS (
    SELECT
        DATE_TRUNC('month', ufp.first_purchase_date) AS cohort,
        SUM(o.total) AS revenue
    FROM user_first_purchase ufp
    JOIN orders o ON ufp.user_id = o.user_id
    GROUP BY cohort
),

cohort_averages AS (
    SELECT
        cohort,
        AVG(revenue) AS avg_revenue
    FROM cohort_revenue
    GROUP BY cohort
)

SELECT
    cohort,
    avg_revenue
FROM cohort_averages
ORDER BY cohort

The CTE version is longer but infinitely more debuggable. You can run each CTE independently to validate intermediate results. This matters enormously in production debugging.

8. Comment non-obvious logic

SQL is often used for business logic that isn't apparent from the data alone. Comment that context directly in the query.

SELECT
    customer_id,
    SUM(order_total) AS lifetime_value
FROM orders
WHERE
    -- Exclude test orders created by the QA team's shared account
    customer_id != 9999
    -- Only count orders after the platform migration (older data is unreliable)
    AND created_at >= '2022-06-01'
GROUP BY customer_id
HAVING SUM(order_total) > 0

Future you will be grateful.

Refactoring a Real Query: Before and After

Here's a realistic before/after from a production analytics query:

Before:

select u.id,count(distinct s.id) as sessions,count(distinct e.id) as events,max(s.started_at) as last_seen from users u left join sessions s on s.user_id=u.id and s.started_at>now()-interval '30 days' left join events e on e.session_id=s.id and e.type not in ('pageview','heartbeat') where u.plan='pro' and u.deleted_at is null group by u.id having count(distinct s.id)>0 order by sessions desc limit 100

After:

SELECT
    u.id,
    COUNT(DISTINCT s.id)  AS sessions,
    COUNT(DISTINCT e.id)  AS events,
    MAX(s.started_at)     AS last_seen
FROM users u
LEFT JOIN sessions s
    ON s.user_id = u.id
    AND s.started_at > NOW() - INTERVAL '30 days'
LEFT JOIN events e
    ON e.session_id = s.id
    -- Exclude low-signal event types from engagement count
    AND e.type NOT IN ('pageview', 'heartbeat')
WHERE
    u.plan = 'pro'
    AND u.deleted_at IS NULL
GROUP BY u.id
HAVING COUNT(DISTINCT s.id) > 0
ORDER BY sessions DESC
LIMIT 100

Same query. Completely different readability. The second version makes the intent, filters, and join logic obvious at a glance.

Tooling: Format First, Then Review

If you're working with legacy SQL or reviewing someone else's query, formatting it first before reading it is one of the highest-ROI habits you can build. A well-formatted query reveals its structure — and its bugs.

For quick formatting without leaving your browser, this SQL formatter handles most SQL dialects and produces consistently indented, readable output in one click. It's especially useful when you inherit a minified query from a log file or a BI tool export and need to understand it quickly.

For integrating formatting into your dev workflow, the sql-formatter npm package is excellent — it supports PostgreSQL, MySQL, SQLite, BigQuery, and more, and you can hook it into Prettier or a pre-commit check.

Conventions Worth Standardizing Across Your Team

If you're working in a team, agree on these and write them down:

Convention	Recommended Default
Keyword casing	`UPPERCASE`
Identifier casing	`snake_case` (match your schema)
Indentation	4 spaces
Alias style	Short abbreviation, not single letter
Column per line	Always in `SELECT` with 3+ columns
CTE preference	Always over 2+ levels of subquery nesting

Enforcing these via a linter (e.g. SQLFluff) rather than hoping for consistency is even better — SQLFluff supports dozens of dialects and integrates with CI pipelines cleanly.

Quick Reference Cheat Sheet

-- ✅ Correct structure
WITH cte_name AS (
    SELECT
        column_one,
        column_two,
        column_three
    FROM table_name tn
    JOIN other_table ot ON tn.id = ot.table_id
    WHERE tn.some_condition = true
)

SELECT
    c.column_one,
    c.column_two,
    c.column_three,
    COUNT(t.id) AS related_count
FROM cte_name c
LEFT JOIN third_table t ON c.column_one = t.foreign_key
GROUP BY
    c.column_one,
    c.column_two,
    c.column_three
ORDER BY c.column_one ASC
LIMIT 1000

Final Thoughts

SQL style isn't about personal preference — it's about the next person (including yourself) who has to read, debug, or extend your query under pressure. Clean SQL is a form of documentation.

The conventions here are not the only valid choices. What matters is consistency. Pick a style, document it, and automate enforcement where possible.

If you have conventions or habits that have saved you in production, drop them in the comments — I'd love to see what others have standardized.

Top comments (2)

Shubhra Pokhariya • May 20

CTEs are honestly one of those things that once you start using them you can never go back. The ability to run each step independently while debugging is what sold me on them. Great writeup, the before/after on that analytics query really drives home how much difference formatting makes.

leob • May 19

Nice!

I think I always do all of the above - but, I didn't know about Common Table Expressions (CTEs) !