We've all been there. You open a PR, and the SQL query looks like someone sneezed on the keyboard:
select u.id,u.name,u.email,o.total,o.created_at from users u join orders o on u.id=o.user_id where o.status='completed' and o.created_at >= '2024-01-01' order by o.created_at desc limit 50
It works. It returns the right data. But it's a nightmare to review, debug, or extend.
SQL readability is not a cosmetic concern — it's an engineering discipline. Messy SQL leads to logic errors, missed optimizations, and silent bugs that survive code review because nobody wants to untangle the mess.
This guide walks through the conventions, habits, and formatting rules that make SQL a joy to work with.
Why SQL Formatting Matters More Than You Think
Unlike application code, SQL queries often live in multiple places simultaneously — inside ORM raw query methods, migration files, stored procedures, BI tools, and debug logs. A single ugly query might be read by:
- A backend engineer debugging a slow endpoint
- A DBA optimizing an index
- A data analyst trying to understand business logic
- You, six months from now, at 11pm before a deploy
Consistent formatting makes all of these scenarios less painful.
The Core Rules
1. Capitalize SQL keywords
SQL keywords should be uppercase. Column names, table names, and aliases should be lowercase (or match your schema's convention).
Bad:
select id, name from users where active = true
Good:
SELECT id, name
FROM users
WHERE active = true
This instantly distinguishes SQL structure from your data model.
2. One clause per line
Every major clause — SELECT, FROM, JOIN, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT — gets its own line. No exceptions.
Bad:
SELECT u.id, u.name, COUNT(o.id) AS order_count FROM users u LEFT JOIN orders o ON u.id = o.user_id WHERE u.created_at >= '2023-01-01' GROUP BY u.id, u.name HAVING COUNT(o.id) > 5 ORDER BY order_count DESC
Good:
SELECT
u.id,
u.name,
COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at >= '2023-01-01'
GROUP BY
u.id,
u.name
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC
The second version can be scanned in seconds. The first requires active decoding.
3. Align your columns in the SELECT list
Put each selected column on its own line, indented consistently (2 or 4 spaces — pick one and stick to it):
SELECT
account_id,
account_name,
region,
plan_type,
created_at
FROM accounts
WHERE status = 'active'
This makes it trivial to add, remove, or reorder columns during development.
4. Qualify every column name in joins
Once you have more than one table in a query, always prefix columns with their table alias. Never leave a reader guessing which table id or created_at belongs to.
Ambiguous:
SELECT id, name, total, created_at
FROM users
JOIN orders ON users.id = orders.user_id
Clear:
SELECT
u.id,
u.name,
o.total,
o.created_at
FROM users u
JOIN orders o ON u.id = o.user_id
This also protects you when schema changes add a new column with a conflicting name — your query won't silently start returning the wrong data.
5. Write meaningful aliases
Single-letter aliases like a, b, x are lazy and harmful. Use short but meaningful abbreviations.
-- Bad
SELECT a.name, b.total
FROM accounts a
JOIN invoices b ON a.id = b.account_id
-- Good
SELECT acc.name, inv.total
FROM accounts acc
JOIN invoices inv ON acc.id = inv.account_id
6. Put JOIN conditions on a separate, indented line (for complex joins)
When join conditions are multi-part or involve expressions, break them out clearly:
SELECT
p.id,
p.title,
c.body AS latest_comment
FROM posts p
LEFT JOIN comments c
ON c.post_id = p.id
AND c.created_at = (
SELECT MAX(created_at)
FROM comments
WHERE post_id = p.id
)
WHERE p.published = true
7. Use CTEs to break up complexity
Common Table Expressions (CTEs) are one of the most underused tools in SQL. Instead of nesting subqueries three levels deep, break the logic into named, readable steps.
Subquery hell:
SELECT avg_revenue, cohort
FROM (
SELECT cohort, AVG(revenue) AS avg_revenue
FROM (
SELECT
DATE_TRUNC('month', first_purchase_date) AS cohort,
SUM(order_total) AS revenue
FROM (
SELECT u.id, MIN(o.created_at) AS first_purchase_date, o.total AS order_total
FROM users u
JOIN orders o ON u.id = o.user_id
GROUP BY u.id, o.total
) user_orders
GROUP BY cohort
) cohort_revenue
GROUP BY cohort
) final
ORDER BY cohort
CTE version:
WITH user_first_purchase AS (
SELECT
u.id AS user_id,
MIN(o.created_at) AS first_purchase_date
FROM users u
JOIN orders o ON u.id = o.user_id
GROUP BY u.id
),
cohort_revenue AS (
SELECT
DATE_TRUNC('month', ufp.first_purchase_date) AS cohort,
SUM(o.total) AS revenue
FROM user_first_purchase ufp
JOIN orders o ON ufp.user_id = o.user_id
GROUP BY cohort
),
cohort_averages AS (
SELECT
cohort,
AVG(revenue) AS avg_revenue
FROM cohort_revenue
GROUP BY cohort
)
SELECT
cohort,
avg_revenue
FROM cohort_averages
ORDER BY cohort
The CTE version is longer but infinitely more debuggable. You can run each CTE independently to validate intermediate results. This matters enormously in production debugging.
8. Comment non-obvious logic
SQL is often used for business logic that isn't apparent from the data alone. Comment that context directly in the query.
SELECT
customer_id,
SUM(order_total) AS lifetime_value
FROM orders
WHERE
-- Exclude test orders created by the QA team's shared account
customer_id != 9999
-- Only count orders after the platform migration (older data is unreliable)
AND created_at >= '2022-06-01'
GROUP BY customer_id
HAVING SUM(order_total) > 0
Future you will be grateful.
Refactoring a Real Query: Before and After
Here's a realistic before/after from a production analytics query:
Before:
select u.id,count(distinct s.id) as sessions,count(distinct e.id) as events,max(s.started_at) as last_seen from users u left join sessions s on s.user_id=u.id and s.started_at>now()-interval '30 days' left join events e on e.session_id=s.id and e.type not in ('pageview','heartbeat') where u.plan='pro' and u.deleted_at is null group by u.id having count(distinct s.id)>0 order by sessions desc limit 100
After:
SELECT
u.id,
COUNT(DISTINCT s.id) AS sessions,
COUNT(DISTINCT e.id) AS events,
MAX(s.started_at) AS last_seen
FROM users u
LEFT JOIN sessions s
ON s.user_id = u.id
AND s.started_at > NOW() - INTERVAL '30 days'
LEFT JOIN events e
ON e.session_id = s.id
-- Exclude low-signal event types from engagement count
AND e.type NOT IN ('pageview', 'heartbeat')
WHERE
u.plan = 'pro'
AND u.deleted_at IS NULL
GROUP BY u.id
HAVING COUNT(DISTINCT s.id) > 0
ORDER BY sessions DESC
LIMIT 100
Same query. Completely different readability. The second version makes the intent, filters, and join logic obvious at a glance.
Tooling: Format First, Then Review
If you're working with legacy SQL or reviewing someone else's query, formatting it first before reading it is one of the highest-ROI habits you can build. A well-formatted query reveals its structure — and its bugs.
For quick formatting without leaving your browser, this SQL formatter handles most SQL dialects and produces consistently indented, readable output in one click. It's especially useful when you inherit a minified query from a log file or a BI tool export and need to understand it quickly.
For integrating formatting into your dev workflow, the sql-formatter npm package is excellent — it supports PostgreSQL, MySQL, SQLite, BigQuery, and more, and you can hook it into Prettier or a pre-commit check.
Conventions Worth Standardizing Across Your Team
If you're working in a team, agree on these and write them down:
| Convention | Recommended Default |
|---|---|
| Keyword casing | UPPERCASE |
| Identifier casing |
snake_case (match your schema) |
| Indentation | 4 spaces |
| Alias style | Short abbreviation, not single letter |
| Column per line | Always in SELECT with 3+ columns |
| CTE preference | Always over 2+ levels of subquery nesting |
Enforcing these via a linter (e.g. SQLFluff) rather than hoping for consistency is even better — SQLFluff supports dozens of dialects and integrates with CI pipelines cleanly.
Quick Reference Cheat Sheet
-- ✅ Correct structure
WITH cte_name AS (
SELECT
column_one,
column_two,
column_three
FROM table_name tn
JOIN other_table ot ON tn.id = ot.table_id
WHERE tn.some_condition = true
)
SELECT
c.column_one,
c.column_two,
c.column_three,
COUNT(t.id) AS related_count
FROM cte_name c
LEFT JOIN third_table t ON c.column_one = t.foreign_key
GROUP BY
c.column_one,
c.column_two,
c.column_three
ORDER BY c.column_one ASC
LIMIT 1000
Final Thoughts
SQL style isn't about personal preference — it's about the next person (including yourself) who has to read, debug, or extend your query under pressure. Clean SQL is a form of documentation.
The conventions here are not the only valid choices. What matters is consistency. Pick a style, document it, and automate enforcement where possible.
If you have conventions or habits that have saved you in production, drop them in the comments — I'd love to see what others have standardized.
Top comments (1)
Nice!
I think I always do all of the above - but, I didn't know about Common Table Expressions (CTEs) !