SQL for developers is how you read and write the systems you already ship — user accounts, orders, feature flags, observability tables. Backend engineers, full-stack builders, and data engineers share the same primitives: relational tables, stable keys, honest JOIN semantics around NULL, explicit grain for aggregates, and ACID discipline when concurrency hits.
What follows mirrors how teams onboard ICs — schema literacy, guarded CRUD, predicate hygiene, joins without silent row multiplication, GROUP BY / HAVING versus window analytics, then indexes, transactions, and EXPLAIN as your debugging lingua franca — every numbered skill block ends like sql interview questions with answers: runnable Postgres SQL, traced execution, and a terse why. After the hero art, dive straight into reps when you crave keyboard time:
Browse practice hub →, open SQL practice →, deepen joins →, reinforce filters →, or widen with database fundamentals →.
On this page
- Why SQL matters for developers and data engineers
- Tables, keys, and the shape of relational data
- Reading and writing rows — SELECT, INSERT, UPDATE, DELETE safely
- Filtering, NULLs, sorting, and LIMIT
- Joins — INNER, LEFT, and when rows multiply
- GROUP BY, HAVING, and analytics-style windows
- Indexes, transactions, ACID, and EXPLAIN-aware debugging
- Choosing SQL skills for your stack (checklist)
- Frequently asked questions
- Practice on PipeCode
1. Why SQL matters for developers and data engineers
The relational contract hiding behind every HTTP handler
Invariant: SQL is the persisted half of almost every SaaS workload — signup rows, entitlement tables, payout ledgers. SQL for developers fluency separates engineers who prototype quickly from teammates who confidently answer "why did this counter disagree with finance?" without escalating.
Detailed explanation. Application tiers think in verbs (POST, PATCH, enqueue) while relational engines expose predicates, constraints, joins, and transactions. When those layers disagree, outages look like phantom bugs but read as inconsistent reads, missing indexes, duplicate grain, or unscoped transactions underneath.
Pro tip: Name grain aloud — exactly one row means one _______ — before you accept a
JOINplan or BI metric.
Why backends, analytics, and SRE converge on SQL
Everybody eventually asks Postgres the same primitives: correlations, aggregates, cardinality checks. Showing up fluent collapses Slack threads into scripts you can rerun, diff, commit to sql/ snippets, instrument in CI smoke tests — even if warehouses later optimise the OLAP clones.
Worked example.
| persona | recurrent question type | payoff from SQL literacy |
|---|---|---|
| backend |
UPDATE fan-out / orphaned FK rows |
deterministic migrations |
| DS / analytics | reproducible cohort filters | parameterized queries |
| SRE | blast-radius queries during incidents |
EXPLAIN-aware rollbacks |
Step-by-step.
- Incident alert references revenue drift — replicate via
JOINspanningpayments⇄refunds. - Count rows twice — once naive, once with explicit
DISTINCT grain_keyassertions. - Validate indexes cover
WHERE+JOINpredicates in staging before prod deploy. - Document literal-free SQL snippets for analytics parity.
OLTP versus analytic workloads
| workload | emblematic SQL | optimise for |
|---|---|---|
OLTP (INSERT, UPDATE critical rows) |
UPDATE … WHERE id = $1 RETURNING … |
short latch-friendly txn windows |
| Warehouse / OLAP |
SUM(x) GROUP BY day (+ optional windows) |
columnar parallelism, partition pruning |
| Replica analytics bridging both | parameterized slice queries | reproducible predicates + timeouts |
Detailed explanation. Mixed-mode Postgres instances still differentiate by predicate selectivity, txn duration, and hardware headroom. Front-line CRUD hates long reads holding locks; heavyweight BI tolerates eventual freshness but demands honest scan plans.
Common beginner traps
- Mirroring spreadsheets — unstructured columns creep into DDL without reviewers.
- Trusting dashboards over source tables — BI layers often coerce grain silently.
-
ORM-only troubleshooting — you still need emitted SQL logs for hidden
N+1joins. -
Ignoring isolation trade-offs —
READ COMMITTEDanomalies appear only under concurrency realism.
2. Tables, keys, and the shape of relational data
Tables are unordered multisets guarded by declarative constraints
Invariant: a row-oriented table expresses one record type; keys pin identity while foreign keys articulate relationships declaratively rather than scattering pointer logic exclusively in Ruby/Java services.
Primary versus natural identifiers
Natural keys mirror business artefacts (ISO country code). Surrogate IDs (BIGSERIAL) stay stable across refactors yet carry no domain meaning alone. Postgres typically combines both: surrogate PK for joins, UNIQUE(email) to enforce human-facing identity.
Worked example.
| design | advantage | caveat |
|---|---|---|
BIGSERIAL PRIMARY KEY |
immutable join edges | meaningless for humans |
UNIQUE(email) |
legible dashboards | brittle if mergers rename emails |
Worked-example DDL commentary. BIGSERIAL auto-sequence ensures monotonic surrogates with cheap index inserts. NOT NULL on email encodes onboarding invariants Postgres enforces deterministically versus optional app validation.
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email TEXT NOT NULL UNIQUE,
city TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE orders (
order_id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE RESTRICT,
total_usd NUMERIC(14,2) NOT NULL CHECK (total_usd >= 0),
placed_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX orders_user_idx ON orders (user_id);
Foreign-key delete semantics matter in production churn
The difference between ON DELETE CASCADE (waves downstream deletes) versus RESTRICT (blocking deletes referencing kids) dictates safe admin tooling workflows. Prefer explicit policies over implicit defaults guessed during incidents.
-- Example: disallow deleting buyers with unpaid orders lingering
ALTER TABLE orders
DROP CONSTRAINT orders_user_id_fkey,
ADD CONSTRAINT orders_user_id_fkey
FOREIGN KEY (user_id)
REFERENCES users(id)
ON DELETE RESTRICT;
Step-by-step.
- Model parent (
users), child (orders) cardinality first. - Select delete policy matching business law — finance rarely cascades blindly.
- Index child FK columns (
CREATE INDEXonorders.user_id). - Add
CHECKconstraints early — cheaper than patching corrupt rows later.
Common beginner mistakes
- Deferring FK creation → silent orphan rows creep under concurrent writers.
-
Currency as floats — use
NUMERIC(p,s)for monetary columns. -
Timestamp without zone interpreted as UTC — prefer
TIMESTAMPTZ+ explicit TZ policy. - Overloading JSONB exclusively — structured columns keep optimiser hints honest.
Quick integrity checklist
| check | Postgres hook |
|---|---|
| uniqueness |
UNIQUE, partial unique indexes |
| domain logic |
CHECK, domain types |
| referential coupling | declarative FK + chosen ON DELETE
|
| auditing | triggers or append-only ledger tables |
Rule of thumb: if two services disagree on cardinality, converge on DDL truth before arguing in Slack.
3. Reading and writing rows — SELECT, INSERT, UPDATE, DELETE safely
Every writer path needs identity — explicit filters and returning projections
Invariant: INSERT, UPDATE, DELETE must name which rows mutate (WHERE), prefer parameters ($1) over interpolated strings, and return affected rows (RETURNING) when callers need confirmations without another round-trip.
Transactions wrap multi-leg business truths
Transfers, swaps, entitlement downgrades seldom touch one isolated row atomically satisfying business law. Wrap coordinated statements (BEGIN … COMMIT) so partial states never linger.
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
Why this works:
-
__BEGIN__opens snapshot / locking context per isolation level chosen. -
Two ordered
UPDATEs express money conservation — auditors expect atomicity tie. -
__COMMIT__publishes both deltas together;ROLLBACK;rewinds catastrophes. - `Cost — transactional overhead negligible versus financial inconsistency fallout.
Parameterized predicates prevent injection and cache reuse
Never splice user strings manually — placeholders keep plans cacheable and thwart SQL injection.
sql
DELETE FROM sessions
WHERE user_id = $1
AND expires_at < NOW();
INSERT patterns developers lean on daily
Bulk ingest + upsert choreography appears constantly — understand both single-row ergonomics (RETURNING) and batch paths.
`sql
INSERT INTO users (email, city)
VALUES ('dev@corp.com', 'Chennai')
RETURNING id, created_at;
INSERT INTO audit_log(event, payload)
VALUES
('password_reset', '{"user_id": 42}'::jsonb),
('mfa_challenge', '{"user_id": 42}'::jsonb);
`
Worked example — idempotent onboarding upserts (ON CONFLICT).
sql
INSERT INTO user_profiles (user_id, display_name)
VALUES ($1, $2)
ON CONFLICT (user_id)
DO UPDATE SET display_name = EXCLUDED.display_name,
updated_at = NOW();
Detailed explanation.
- Conflicting rows recycle through
EXCLUDEDpseudo-table exposing proposed values. - Pair with partial unique indexes for conditional uniqueness scenarios (invite tokens, soft deletes).
Safe UPDATE hygiene checklist
| step | rationale |
|---|---|
| dry-run SELECT clone | verifies row cardinality before mutation |
LIMIT + key filter |
avoids blanket table rewrite |
RETURNING old.* auditing |
emits before/after diffs programmatically |
Rule of thumb: if deleting “temp junk,” still scope by timestamp + TTL — surprise full-table wipes bankrupt trust faster than slow queries.
4. Filtering, NULLs, sorting, and LIMIT
Logical evaluation order is not top-to-bottom textual order
Invariant: Engines conceptually apply clauses in this order — FROM → JOIN → WHERE → GROUP BY → HAVING → windowing → SELECT expressions → DISTINCT → ORDER BY → LIMIT/OFFSET. Textual SQL order differs deliberately; misplacing expectations causes “phantom” aggregates or illegal references.
Detailed explanation.
- Predicates in
WHEREsee raw row grain before collapsing viaGROUP BY. -
SELECTaliases rarely appear insideWHERE(Postgres exceptions exist for subqueries, not shortcuts). -
ORDER BYruns after projection — you may sort computed expressions.
NULL is tri-valued logic (TRUE, FALSE, UNKNOWN)
Comparisons involving UNKNOWN ripple through compound predicates unpredictably unless you memorize De Morgan interactions with AND/OR.
`sql
SELECT * FROM users WHERE city IS NULL; -- correct
SELECT * FROM users WHERE city = NULL; -- ALWAYS UNKNOWN → zero rows (pitfall)
SELECT *
FROM experiments
WHERE status IS DISTINCT FROM outcome; -- treats NULL knowingly
`
Worked example.
| predicate | evaluates when city NULL |
|---|---|
city = 'Chennai' |
UNKNOWN |
city IS NULL |
TRUE |
Worked-example pattern — COALESCE bridging optional columns.
sql
SELECT COALESCE(city, 'unknown') AS city_label
FROM users;
Combine with NULLIF to coerce sentinel blanks into canonical NULL semantics.
Composing predicates responsibly
Prefer explicit parentheses when mixing AND/OR:
sql
SELECT id, email
FROM users
WHERE (city IN ('Hyderabad', 'Chennai') OR vip IS TRUE)
AND suspended IS FALSE;
ORDER BY, LIMIT:
sql
SELECT id, email
FROM users
WHERE city IN ('Hyderabad', 'Chennai')
ORDER BY email ASC
LIMIT 25 OFFSET 50;
Detailed explanation.
-
Large OFFSET scans skipped rows wastefully — keyset pagination (
WHERE id > $cursor ORDER BY id LIMIT) often cheaper at scale even if ergonomically heavier. -
Stable sorts duplicate tie-break columns (
ORDER BY created_at DESC, id DESC) to defeat nondeterministic pages.
Common beginner traps
-
NOT IN (...)collapses unexpectedly when inner list harborsNULL(entire predicate UNKNOWN). -
BETWEENinclusive endpoints surprise folks expecting half-open intervals. -
ILIKE '%foo%** cannot exploit plain B-tree indexes without pg_trgm / expression indexes.
Optional pattern — existential filters with EXISTS
Prefer semijoins when probing presence without caring about multiplicity:
sql
SELECT u.id
FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id
);
Rule of thumb: when debugging filters, SELECT COUNT(*) before and after layering predicates — divergence isolates offending clause fast.
5. Joins — INNER, LEFT, and when rows multiply
Joins reshape cardinality — tame fan-out before aggregates
Invariant: JOIN combines row sets via predicates (typically equality on keys). When either side is one-to-many, the result repeats left rows once per matching right row unless you stabilize grain first.
Detailed explanation.
-
INNER JOINemits only pairs that satisfy theONclause — unmatched rows on either side disappear. -
LEFT [OUTER] JOINkeeps every row from the left spine; unmatched right columns becomeNULL(sentinel absence, not “empty string”). -
FULL OUTERis rarer in app code but useful when reconciling two feeds where either side might be orphaned.
`sql
-- Buyer activity: users who ordered at least once (one row per matching order pair)
SELECT u.name, o.order_id
FROM users u
JOIN orders o ON o.user_id = u.id;
-- Anti-join — users present on the LEFT with NO matching orders on the RIGHT
SELECT u.name
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.order_id IS NULL;
`
INNER vs LEFT in one rehearsal table
| join | survives without match on opposite side | read it as |
|---|---|---|
INNER |
no | “pairs only — inner intersection” |
LEFT |
yes (left survives) | “keep cohort A, optionally attach B” |
Fan-out rehearsal (why COUNT(*) lied)
| spine | facts | INNER join rows if 3 matching orders |
|---|---|---|
| 1 Alice | 3 orders for Alice | 3 rows named Alice |
COUNT(*), SUM(amount) downstream now count result rows, not necessarily distinct users. Fix upstream with distinct keys, sub-aggregates, or semi-joins (EXISTS / IN) depending on semantics.
Developer SQL interview question — users who never ordered
Tables: users(id, name) and orders(order_id, user_id) with referential integrity. List dormant accounts (users who never placed an order), ordered deterministically.
Solution Using LEFT JOIN anti-join
Code solution.
sql
SELECT u.id, u.name
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE o.order_id IS NULL
ORDER BY u.id;
Step-by-step trace.
| step | planner story |
|---|---|
| 1 | Build left spine — one output row candidate per user. |
| 2 | For each user, seek/probe matching orders on orders.user_id. |
| 3 | If no row qualifies, o.* becomes NULL, including surrogate order_id. |
| 4 |
WHERE o.order_id IS NULL keeps only unmatched left rows (anti-pattern if order_id could be NULL absent FK — integrity prevents that here). |
Output:
| id | name |
|---|---|
| 404 | dormant |
Companion pattern — NOT EXISTS (duplicate-safe semi-join).
sql
SELECT u.id, u.name
FROM users u
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id
)
ORDER BY u.id;
Why this works — concept by concept:
- LEFT preservation — every user survives until filtered; dormant cohort remains visible.
-
NULL sentinel — with a real surrogate key on orders,
order_idNULL reliably means no attachment, not ambiguous data void. - NOT EXISTS — logically ignores duplicate orders per user (semijoin); no accidental multiplier from exploding the right side.
-
Index leverage — B-tree on
orders(user_id)turns probes into seeks instead of scans. - Cost — hash or merge-family join tends toward Θ(n + m) cardinality with healthy selectivity versus nested-loop blowups on accidental cross products.
SQL
Language — SQL
SQL language library
6. GROUP BY, HAVING, and analytics-style windows
Collapse grain or stay row-wise — aggregates vs windows split the job
Invariant: GROUP BY collapses rows into buckets, producing one output row per group after aggregation. OVER() keeps input grain — every source row survives while you decorate it with comparative metrics (rank, running sum).
Detailed explanation.
- After
JOIN/WHERE, the relational engine optionally groups by the listed expressions.SELECTmay reference either grouping keys or aggregate functions evaluated inside each bucket — stray base columns violate the contract unless functionally dependent (Postgres errs early; beware lenient dialects permitting hidden ambiguity). -
HAVINGfilters post-aggregation predicates (COUNT(*) > 10), whileWHEREtrims rows before bucketing (cannot cite aliases fromSELECTaggregates — repeat the aggregate or nest a subquery). -
Window frames (
ROWS/RANGE/GROUPS) default per function; analytic ranking (ROW_NUMBER,RANK,DENSE_RANK) ignores frame — they only needPARTITION BY+ORDER BY.
Stock patterns side by side
`sql
-- Collapse city grain — one row per city after aggregation
SELECT city, COUNT() AS n
FROM users
GROUP BY city
HAVING COUNT() > 5;
-- Decorate employee grain — ranking within department without collapsing rows
SELECT emp_id,
dept_id,
salary,
ROW_NUMBER() OVER (PARTITION BY dept_id ORDER BY salary DESC) AS rn
FROM employees;
`
Dedup companion — deterministic keeper rows
Grouped counts diagnose violators; windows remediate duplicates while picking one canonical row (Partition by key, ORDER BY audit columns, filter WHERE rn = 1):
sql
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY updated_at DESC NULLS LAST, id ASC
) AS rn
FROM users
)
SELECT *
FROM ranked
WHERE rn > 1; -- offenders to DELETE or archive in a batched migration
Developer SQL interview question — duplicate emails awaiting cleanup
List normalized emails appearing more than once with violation counts.
Solution Using grouped counts
Code solution.
sql
SELECT email, COUNT(*) AS dup_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY dup_count DESC, email;
Step-by-step trace.
| step | computation |
|---|---|
| 1 | Normalize upstream if needed (LOWER(TRIM(email)) in ingestion or projection). |
| 2 |
GROUP BY email emits one accumulator per distinct key. |
| 3 |
HAVING COUNT(*) > 1 drops singleton buckets — only violators survive. |
Output:
| dup_count | |
|---|---|
| dev@corp | 3 |
Why this works — concept by concept:
-
GROUP BY bucket — each email becomes its own multiset;
COUNT(*)measures multiplicity after predicates. -
Having vs where —
WHEREcannot expressCOUNT(*) > 1without a subquery;HAVINGruns on aggregated state. -
Operational pairing — follow with
ROW_NUMBER()partitioning on the same dedupe key to choose survivors without arbitrary ties. -
Cost profile — hash aggregate typically linear in spilled row volume; spilled sorts add I/O proportional to
work_mempressure.
SQL
Topic — window functions
Window SQL drills
SQL
Topic — group-by
GROUP BY drills
7. Indexes, transactions, ACID, and EXPLAIN-aware debugging
Indexes + planners + transactional contracts separate “queries” from “systems”
Detailed explanation.
- Default B-tree indexes accelerate equality / range predicates (
WHERE city = 'Hyderabad'orWHERE created_at BETWEEN …) via seeks instead of sequential scans once selectivity warrants them. - Indexes are derived state — every insert/update/delete touches matching index entries (write amplification); partial indexes prune maintenance when predicates cover a cohort (
WHERE churned IS FALSE). -
CREATE INDEX CONCURRENTLYavoids long ACCESS EXCLUSIVE locks on Postgres hot tables during build — trade-off is longer DDL and retry bookkeeping if build fails midway.
`sql
CREATE INDEX CONCURRENTLY idx_users_city ON users (city);
EXPLAIN (ANALYZE, BUFFERS)
SELECT *
FROM users
WHERE city = 'Hyderabad';
`
Interview-grade EXPLAIN reading hints.
- Prefer
EXPLAIN (ANALYZE, BUFFERS)for truth about buffers hit vs read, loops, actual row counts — textual plans alone omit runtime skew. - Watch estimated vs actual row mismatches (bad stats ⇒ wrong join order, surprise nested loops on big inner sides).
-
Seq Scanacceptable on narrow tables / cold caches; punitive when millions of qualifying rows funnel through filter after scan.
ACID anchors
| letter | mnemonic | what to say in-panel |
|---|---|---|
| A | Atomicity | all statements commit together or ROLLBACK rewinds observable effects |
| C | Consistency | constraints + declarative checks hold on commit boundaries |
| I | Isolation | levels trade phantom reads vs concurrency — READ COMMITTED default on Postgres exposes committed deltas each statement; REPEATABLE READ / MVCC snapshots tame drift for longer read transactions |
| D | Durability | WAL persists committed work across crashes |
Practical transaction hygiene
Declare explicit boundaries (BEGIN / COMMIT) when orchestrating multi-table invariants — ORMs emitting autocommit per statement fracture money-movement narratives. Serialization failures (SQLSTATE 40001) under SERIALIZABLE / snapshot conflicts signal retry opportunities, not “random database bugs.” Pair schema changes with CONCURRENT index creations and phased backfills whenever traffic tolerates eventual consistency stages.
Choosing SQL skills for your stack (checklist)
| horizon | competency | Practice lane |
|---|---|---|
| Week 1 | DDL + FK stories | /explore/practice/topic/database → |
| Week 2 | predicates + paging | /explore/practice/topic/filtering/sql → |
| Week 3 | joins + cardinality | /explore/practice/topic/joins/sql → |
| Week 4 | aggregates/windows | /explore/practice/topic/aggregation/sql → |
Frequently asked questions
Which dialect first?
PostgreSQL is the pragmatic default — rich standard SQL surface, expressive JSONB, strong isolation story, ubiquitous in modern stacks, and interview panels often cite it explicitly. Treat SQLite as excellent for correctness drills and EXPLAIN QUERY PLAN intuition, MySQL where legacy hiring signals demand it — but defer dialect trivia until relational mechanics feel automatic.
LEFT JOIN … IS NULL versus NOT EXISTS?
Both express relational difference (anti-semijoins). LEFT JOIN + NULL filter is readable when order_id is a dependable surrogate and you want symmetry with exploratory outer joins. NOT EXISTS scales mentally when duplicates on the right explode intermediate row counts — multiplicity does not falsify existential absence. Mention both in reviews; pick based on cardinality and planner friendliness verified with EXPLAIN.
When do aggregates beat windows?
Collapse to metric tables or cohort summaries ⇒ GROUP BY + aggregates + HAVING. Preserve row grain while ranking/deduping/running deltas ⇒ OVER() partitions. Mixed patterns stack CTE layers: aggregate subtotals upstream, JOIN keyed aggregates back, then window across enriched rows once uniqueness returns.
Are indexes unequivocally good?
No — each index consumes storage, lengthens mutation paths (insert/update hotspots), and complicates migrations. Prefer narrow partial indexes, covering composites aligning with ORDER BY when read savings dominate, but baseline with metrics (p95 latency, churn on write-heavy queues) rather than speculative indexing folklore.
What proves seniority fastest?
Fluent EXPLAIN storytelling (buffer churn, mis-estimates), articulating isolation anomalies you have actually chased (lost updates, phantom reads), and disciplined schema evolution (locking, backfills, concurrency-safe DDL). Read alone plateaus — annotate past incidents with reproduction SQL and planner diffs during debrief loops.
Do ORMs replace raw SQL literacy?
ORMs scaffold CRUD ergonomics quickly, yet N+1, implicit transaction scopes, brittle migrations, deadlock graphs, and hot-index regressions still surface raw SQL realities. Seniors jump between declarative mappings and handwritten SQL confidently because production incidents rarely respect abstraction boundaries exclusively.
Practice on PipeCode
PipeCode ships 450+ interview-grade problems spanning SQL joins, aggregates, transactional logic, analytics windows, subqueries, and pragmatic schema debugging. Anchor on Explore practice →, escalate through language SQL →, grab sets like topic SQL → or subqueries →, and unlock plans → whenever you want unrestricted runs.





Top comments (0)