If you have ever tried to keep your app's data in a JSON file or in memory, you know how that story ends. At first it works. Two users? Easy. A hundred users with orders, addresses, and a history of returns? Now your "database" is a 40MB file you are scared to open, and looking up a single record takes a full scan of the disk.
You start writing helper functions. Find a user by email. Find all their orders. Make sure no order points at a missing user. After a week of this, you have invented a worse, slower database, and you still cannot answer "what were our top 5 selling books last month?".
That is the gap SQL fills.
What is SQL, really
Think of SQL as a giant spreadsheet on steroids, with a strict librarian living inside. Your data goes into tables (sheets). Each row is a record. Each column has a type and rules. The librarian enforces the rules: no missing required fields, no duplicate ids, no orders pointing to a user that does not exist.
You do not poke around the sheets yourself. You write a polite question in a special language, and the librarian goes off, walks the shelves, and brings back exactly the rows you asked for. The question is declarative. You say what you want, not how to get it. The librarian (the query planner) figures out the fastest path.
That is the whole vibe.
Let's pretend we are building one
We want a way to store structured data, enforce rules between pieces of data, and answer questions about it without writing custom code every time. We will call the language SQL (Structured Query Language) and the engine behind it a relational database.
For our running example, we are opening a tiny online bookstore. Books, authors, customers, orders. We will model the whole thing and learn to ask it useful questions.
Decision 1: Data lives in tables, with a strict shape
A table has a fixed set of columns. Each column has a type. Each row must obey the shape. No "this row has an extra field" surprises.
CREATE TABLE authors (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
born_year INT
);
CREATE TABLE books (
id BIGSERIAL PRIMARY KEY,
title TEXT NOT NULL,
author_id BIGINT NOT NULL REFERENCES authors(id),
price_cents INT NOT NULL CHECK (price_cents >= 0),
published DATE,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE customers (
id BIGSERIAL PRIMARY KEY,
email TEXT NOT NULL UNIQUE,
name TEXT NOT NULL
);
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
customer_id BIGINT NOT NULL REFERENCES customers(id),
book_id BIGINT NOT NULL REFERENCES books(id),
qty INT NOT NULL CHECK (qty > 0),
placed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
A few things worth pausing on, because they show up everywhere:
-
PRIMARY KEYis the row's unique identity. Every table has one. -
REFERENCES other_table(id)is a foreign key. The librarian will refuse to insert an order for a customer that does not exist, and refuse to delete a customer who still has orders (unless you sayON DELETE CASCADE). -
NOT NULL,UNIQUE,CHECKare constraints that bake business rules into the data, not into the app code. The database becomes the last line of defense, even if six different apps write to it. -
DEFAULTlets the database fill in values for you (timestamps, ids, version numbers).
This shape is the schema. The whole database is just schemas plus rows.
Quick sanity rule: if your data has a stable shape and relationships matter, use SQL. If your data is messy, document like, and varies row to row, that is what we will discuss in the MongoDB post.
Decision 2: Four verbs, infinite combinations
Every interaction with SQL is one of four verbs. Learn these and you have learned 70% of the language.
Insert
INSERT INTO authors (name, born_year)
VALUES ('Saint-Exupery', 1900);
INSERT INTO books (title, author_id, price_cents, published)
VALUES
('The Little Prince', 1, 999, '1943-04-06'),
('Night Flight', 1, 1199, '1931-01-01');
Select (the workhorse)
SELECT id, title, price_cents
FROM books
WHERE price_cents < 1500
ORDER BY published DESC
LIMIT 10;
Read it like English: pick these columns, from this table, where this is true, sorted this way, give me the first 10.
Update
UPDATE books
SET price_cents = 1099
WHERE id = 1;
If you forget the WHERE, you update every row. Yes, every senior has done it once. Wrap big updates in a transaction, we will see how soon.
Delete
DELETE FROM books WHERE id = 1;
Same warning as UPDATE. No WHERE, no rows left.
Decision 3: Filtering rows the librarian can understand
Inside WHERE you can stack conditions. A short list of the most useful operators:
SELECT * FROM books
WHERE price_cents BETWEEN 800 AND 1500
AND title ILIKE '%prince%' -- case insensitive in Postgres
AND author_id IN (1, 2, 3)
AND published IS NOT NULL
AND published >= DATE '2000-01-01';
The classic gotcha: NULL is not equal to anything, not even itself. WHERE col = NULL always returns no rows. Use IS NULL and IS NOT NULL. Always.
LIKE 'foo%' matches the prefix foo. LIKE '%foo' matches the suffix. LIKE '%foo%' is the full text contains. The percent sign is the wildcard.
Decision 4: Joining tables, the actual superpower
Splitting data into multiple tables only pays off if we can stitch them back together at query time. That is what joins do.
The four flavors that matter:
-- INNER JOIN: only rows that match in both tables
SELECT b.title, a.name AS author
FROM books b
JOIN authors a ON a.id = b.author_id;
-- LEFT JOIN: every row from the left, with NULLs where the right has nothing
SELECT a.name, b.title
FROM authors a
LEFT JOIN books b ON b.author_id = a.id;
-- An author with zero books still appears, with title = NULL.
-- RIGHT JOIN: mirror image of LEFT JOIN. Rarely used (just flip the tables).
-- FULL OUTER JOIN: every row from both sides, NULL where the other is missing.
The ON clause says how to match. The WHERE clause filters the joined result.
Two senior level rules that catch newcomers:
-
A
LEFT JOINplus aWHEREon the right table acts like anINNER JOIN. If you writeLEFT JOIN books WHERE books.published IS NOT NULL, you have just thrown away the unmatched rows. Put filters that should keep the unmatched rows inside theONclause:LEFT JOIN books b ON b.author_id = a.id AND b.published IS NOT NULL. -
Always alias your tables in non trivial queries.
b.titleis much easier to read (and grep) thanbooks.titlerepeated five times.
Decision 5: Aggregations, the "summarize this" verbs
When you do not want individual rows but a roll up, you reach for GROUP BY and the aggregate functions: COUNT, SUM, AVG, MIN, MAX.
-- Top selling books last month
SELECT b.id, b.title, SUM(o.qty) AS sold
FROM orders o
JOIN books b ON b.id = o.book_id
WHERE o.placed_at >= NOW() - INTERVAL '30 days'
GROUP BY b.id, b.title
ORDER BY sold DESC
LIMIT 5;
Two rules to internalize:
- Every column in the
SELECTmust either be in theGROUP BYor wrapped in an aggregate. -
HAVINGfilters groups.WHEREfilters rows before grouping. Do not confuse them.
SELECT customer_id, COUNT(*) AS orders_count
FROM orders
GROUP BY customer_id
HAVING COUNT(*) >= 3; -- only customers with 3+ orders
Decision 6: When one query is not enough, build it in pieces
For complex queries, stacking joins and aggregations gets unreadable fast. We added two tools for that.
Subqueries
A query inside a query. Treat its result as a virtual table.
SELECT title
FROM books
WHERE author_id IN (SELECT id FROM authors WHERE born_year < 1950);
Common Table Expressions (CTEs)
Same idea, named, top to bottom. Far more readable when the query has multiple stages.
WITH top_books AS (
SELECT book_id, SUM(qty) AS sold
FROM orders
WHERE placed_at >= NOW() - INTERVAL '30 days'
GROUP BY book_id
)
SELECT b.title, t.sold
FROM top_books t
JOIN books b ON b.id = t.book_id
ORDER BY t.sold DESC
LIMIT 5;
CTEs can be recursive, which is how you walk tree shaped data (categories with subcategories, employees with managers, comments with replies):
WITH RECURSIVE descendants AS (
SELECT id, parent_id, name FROM categories WHERE id = 7
UNION ALL
SELECT c.id, c.parent_id, c.name
FROM categories c
JOIN descendants d ON d.id = c.parent_id
)
SELECT * FROM descendants;
CTEs read top to bottom like a small program. Use them.
Decision 7: Window functions, the senior superpower
Sometimes you want a per row calculation that "sees" other rows around it without collapsing into groups. That is what window functions do.
-- Rank books by sales within each author
SELECT
b.author_id,
b.title,
SUM(o.qty) AS sold,
RANK() OVER (
PARTITION BY b.author_id
ORDER BY SUM(o.qty) DESC
) AS rank_for_author
FROM books b
JOIN orders o ON o.book_id = b.id
GROUP BY b.author_id, b.title;
OVER(...) says "do this calculation across this window". PARTITION BY is the group, ORDER BY is the order within the group. You get one output row per input row, plus the calculated value.
Common ones to know:
-
ROW_NUMBER()for "give each row a serial number". -
RANK()andDENSE_RANK()for tied ranking. -
LAG(col),LEAD(col)for "value of the previous/next row in the window". -
SUM(col) OVER (ORDER BY ...)for running totals.
Once these click, a whole class of "I had to do this in app code" problems become one liners.
Decision 8: Indexes, the secret to fast queries
A table without indexes is a phone book printed in random order. Finding "Alice" means starting at page one. With an index, the librarian builds an alphabetical map: "Alice is around page 47, go straight there".
CREATE INDEX books_author_id_idx ON books (author_id);
CREATE INDEX orders_customer_placed_idx ON orders (customer_id, placed_at DESC);
CREATE UNIQUE INDEX customers_email_idx ON customers (email);
The senior level rules:
- Primary keys and unique constraints already create indexes. You get those free.
- Index columns you filter or join on, not columns you only display.
-
Composite index column order matters.
(customer_id, placed_at)helps queries that filter bycustomer_id(alone) or bycustomer_idplusplaced_at. It does not help a query that only filters byplaced_at. -
Postgres has special index types: GIN (for arrays and JSONB), BRIN (for huge time series), partial (
WHERE active = true), expression (LOWER(email)). - Indexes are not free. Every write has to update them. Do not index every column "just in case". Profile first.
To find out what the database is actually doing for a query, ask:
EXPLAIN ANALYZE
SELECT * FROM books WHERE author_id = 1;
That returns the query plan. Look for "Index Scan" (good) versus "Seq Scan" (full table read, fine for small tables, scary for big ones), and look at the actual time per step.
Reading query plans is the single skill that separates "writes SQL" from "writes fast SQL". Spend time on it.
Decision 9: Transactions, all or nothing
Sometimes a single user action needs multiple statements, and "half done" is the worst possible state. Money moved out of one account but never into the other. An order created without payment.
We wrap the statements in a transaction. Either every statement commits, or none of them do.
BEGIN;
UPDATE accounts SET balance = balance - 1000 WHERE id = 1;
UPDATE accounts SET balance = balance + 1000 WHERE id = 2;
COMMIT; -- or ROLLBACK if anything went wrong
Transactions guarantee four properties together known as ACID:
- Atomicity: all or nothing.
- Consistency: constraints stay valid before and after.
- Isolation: concurrent transactions do not see each other's half done work.
- Durability: once committed, it survives a crash.
Isolation has levels (read committed, repeatable read, serializable). The default in Postgres is READ COMMITTED, which is fine for most apps. Reach for SERIALIZABLE when you have a critical multi step invariant and you would rather have the database retry than risk anomalies. Read about phantom reads and lost updates once, then you will recognize the bug shapes when you see them.
Decision 10: Views and materialized views
If a complex query is something you ask all the time, give it a name.
CREATE VIEW top_books_30d AS
SELECT b.id, b.title, SUM(o.qty) AS sold
FROM orders o
JOIN books b ON b.id = o.book_id
WHERE o.placed_at >= NOW() - INTERVAL '30 days'
GROUP BY b.id, b.title;
SELECT * FROM top_books_30d ORDER BY sold DESC LIMIT 5;
A view is just a saved query. It runs every time you SELECT from it.
A materialized view stores the result physically and refreshes on demand. Great for expensive analytics that do not need to be real time.
CREATE MATERIALIZED VIEW top_books_30d AS ...;
REFRESH MATERIALIZED VIEW top_books_30d;
Decision 11: Set operations
Less famous, but very handy.
-- Books either by author 1 or with sales (and both)
SELECT id FROM books WHERE author_id = 1
UNION
SELECT book_id FROM orders;
-- Same idea but keep duplicates
UNION ALL
-- In A but not in B
EXCEPT
-- In both
INTERSECT
UNION deduplicates and is slower. UNION ALL keeps everything and is faster. Pick on purpose.
Decision 12: Talking to SQL safely from your app
This is the one rule you must never break.
Never build a query by string concatenation with user input. That is how you get SQL injection, the most embarrassing bug in our profession.
// NEVER
db.query(`SELECT * FROM users WHERE email = '${email}'`);
// ALWAYS
db.query("SELECT * FROM users WHERE email = $1", [email]);
Use parameterized queries. Every modern client supports them. The driver sends the SQL and the values separately, so the database never confuses one for the other.
The other classic app side trap is the N+1 query problem. You fetch a list of authors, then for each one, you fire a separate query for their books. 1 query becomes 1 + N queries. The fix is one query with a JOIN, or batch loading by id list (WHERE author_id IN (...)).
Decision 13: Designing schemas that age well
The art is in the schema, not the queries. Tables you regret are tables you fight forever. A short senior level checklist:
- Normalize first, denormalize on purpose. A normalized schema (no duplicated facts) is easier to keep correct. You can always add a denormalized cache column later if profiling demands it.
-
Pick the right primary key.
BIGSERIAL(auto increment) is fine for most things. UUIDs are great for distributed systems and for not exposing row counts in URLs. UseUUID v7if you need sortable UUIDs. -
Always store timestamps with timezone.
TIMESTAMPTZ, notTIMESTAMP. Future you will thank present you. -
Money in integer cents, not floats.
1.10 + 2.20in floats is heartbreak. -
Booleans are fine. Do not use a
CHAR(1)"Y"/"N". This is not 1995. - Add indexes for foreign keys. The database does not always do this for you, and queries that join on them will sit and cry without one.
-
Soft delete with a
deleted_at TIMESTAMPTZcolumn if you need recoverable deletes. Then addWHERE deleted_at IS NULLto your reads (or a partial index). - Migrations are version controlled SQL. Tools like Flyway, Sqitch, Prisma Migrate, Alembic. Never edit the production schema by hand.
Decision 14: A peek under the hood
What really happens when you run a query:
- The client sends the SQL to the server (bind parameters separately if you used a parameterized query).
- The parser turns the text into a syntax tree.
- The planner / optimizer considers many possible execution plans (full scan, index scan, different join orders, hash join vs merge join vs nested loop). It uses statistics about the table sizes and column distributions to pick the cheapest plan it can find.
- The executor runs the chosen plan, reading pages from disk or the buffer cache, joining, filtering, aggregating.
- Results stream back to the client. Transactions wrap all of this in WAL (write ahead log) entries for durability.
That little planner is the brain of the whole thing. It is also why the same query can be fast on Tuesday and slow on Wednesday: the data shape changed, and the planner picked a different plan. Keep statistics fresh (ANALYZE in Postgres) and you will rarely be surprised.
A short guide to dialects
SQL is a standard. Real engines extend and tweak it. The big three you will see:
-
PostgreSQL: the senior favorite. Best in class JSON support, window functions, CTEs, generated columns, partial indexes,
RETURNING, extensions. Default choice for most new apps in 2026. -
MySQL / MariaDB: massive ecosystem, fine performance, weaker default constraints (set
STRICT_ALL_TABLES), historically weaker on JSON and window features (now caught up). - SQLite: a single file database, perfect for embedded use, tests, mobile, and surprisingly capable for small to mid sized apps. Great for prototypes.
Almost every example in this post is portable. The places you will find dialect differences: identifier quoting, auto increment syntax, JSON functions, full text search, and date functions.
Tiny tips that will save you later
-
SELECT *is fine in the editor, dangerous in production code. List the columns you need. -
Always run an
UPDATEorDELETEas aSELECTfirst. SameWHERE, same eyes. -
Wrap risky changes in a transaction with
BEGIN; ... ROLLBACK;until you are certain. -
COUNT(*)counts rows.COUNT(col)skips NULLs. They are different. -
Prefer
EXISTSoverINfor subqueries that may grow large. -
Filter early. Push
WHEREand joins as low as possible so fewer rows climb the pipeline. - Read the query plan before optimizing. Guessing is a waste of an evening.
- Connection pool, do not connect per request. Use PgBouncer, the language client's built in pool, or your framework's helper.
-
Put long names into a style guide.
snake_casefor tables and columns, plural for tables (books, notbook). Pick one and never argue about it again. - Back up the database. Test the restore. A backup you have never restored is a wish, not a backup.
Wrapping up
So that is the whole story. We were tired of inventing tiny worse databases out of files and helper functions. We built one engine that holds structured data, enforces relationships, and answers questions in a declarative language we agreed to call SQL.
We taught the engine four verbs: INSERT, SELECT, UPDATE, DELETE. We let it stitch tables back together with joins, summarize with aggregations, and walk windows with window functions. We added CTEs so complex queries read top to bottom. We made it fast with indexes, and gave it transactions and ACID guarantees so half done work is never visible. We baked rules into the schema with constraints, so the data stays sane even when buggy apps try to mess it up.
Once that map is in your head, every database tutorial, blog post, and codebase starts to feel familiar. SQL stops feeling like a foreign language and starts feeling like a calm conversation with a very picky librarian who is, secretly, on your side.
Happy querying, and put a bookmark in chapter two for me.
Top comments (0)