Context: These are distilled notes from CS50’s Introduction to Databases with SQL (Lecture 1). They focus on multi-table thinking: how to design relationships and query across tables with idiomatic SQL. Examples use SQLite syntax but are easily portable to PostgreSQL.
Table of Contents
- Table of Contents
- Relational Databases & ER Modeling
- Keys: Primary & Foreign
- Subqueries (Nested Queries)
- JOINs: INNER, LEFT/RIGHT/FULL, NATURAL
- Set Operations: UNION, INTERSECT, EXCEPT
- Grouping & Aggregation: GROUP BY & HAVING
- Style Tips
- Cheat Sheet
Relational Databases & ER Modeling
Relational databases split a domain (e.g., books & publishing) into entities (authors, books, publishers, translators, ratings) and define the relationships between them.
Relationship types
- 1–1: One author ↔ one profile.
- 1–many: One publisher → many books.
- many–many: Many authors ↔ many books (co-authorship).
ER (Entity–Relationship) diagrams
- Boxes = entities (tables).
- Lines = relationships.
- “Crow’s foot” = “many”; single bar
|
= “one”; circleo
= “optional (zero)”. - Read edges as sentences: “Author wrote Book (1..*)”.
Example tables
-- authors(id PK, name, ...)
-- books(id PK, title, publisher_id FK, ...)
-- publishers(id PK, name, ...)
-- authored(author_id FK → authors.id, book_id FK → books.id) -- junction for many–many
Keys: Primary & Foreign
-
Primary Key (PK): unique, stable identifier for a row (e.g.,
id SERIAL
/INTEGER
). - Foreign Key (FK): column that references another table’s PK, creating the relationship.
- Prefer compact numeric PKs over long natural keys (e.g., ISBN) for performance & simplicity.
Example
CREATE TABLE books (
id INTEGER PRIMARY KEY,
title TEXT NOT NULL,
publisher_id INTEGER REFERENCES publishers(id)
);
CREATE TABLE authored (
author_id INTEGER REFERENCES authors(id),
book_id INTEGER REFERENCES books(id),
PRIMARY KEY (author_id, book_id) -- prevent duplicates
);
Subqueries (Nested Queries)
Use subqueries when one query depends on the result of another (e.g., look up an id, then use it).
Find all books published by “Fitzcarraldo Editions”
SELECT title
FROM books
WHERE publisher_id = (
SELECT id FROM publishers WHERE name = 'Fitzcarraldo Editions'
);
Average rating for a specific title
SELECT AVG(r.rating) AS avg_rating
FROM ratings r
WHERE r.book_id = (
SELECT b.id FROM books b WHERE b.title = 'In Memory of Memory'
);
When to use IN
vs =
- Use
=
when subquery returns one value. - Use
IN (...)
when subquery returns multiple values:
SELECT title
FROM books
WHERE id IN (
SELECT book_id
FROM authored
WHERE author_id = (SELECT id FROM authors WHERE name = 'Fernanda Melchor')
);
JOINs: INNER, LEFT/RIGHT/FULL, NATURAL
JOIN stitches rows across tables using matching keys.
INNER JOIN (default JOIN
in SQLite)
Returns only rows with matches in both tables.
SELECT b.title, p.name AS publisher
FROM books b
JOIN publishers p ON p.id = b.publisher_id;
LEFT / RIGHT / FULL OUTER JOIN
Keep unmatched rows from left, right, or both tables (missing columns are NULL
).
SQLite supports
LEFT JOIN
.RIGHT/FULL
are common in PostgreSQL.
LEFT JOIN
SELECT a.name AS author, b.title
FROM authors a
LEFT JOIN authored au ON au.author_id = a.id
LEFT JOIN books b ON b.id = au.book_id;
NATURAL JOIN
Automatically joins on columns with the same name. Handy but be careful—implicit matches can surprise you if schemas change.
-- If both tables have a column named id:
SELECT * FROM sea_lions NATURAL JOIN migrations;
Tip: Prefer explicit ... ON table.col = table.col
in production schemas.
Set Operations: UNION, INTERSECT, EXCEPT
Work on compatible result sets (same number/type/order of columns).
All names that are either authors or translators
SELECT name FROM authors
UNION
SELECT name FROM translators;
Names that are both authors and translators
SELECT name FROM authors
INTERSECT
SELECT name FROM translators;
Authors who are not translators
SELECT name FROM authors
EXCEPT
SELECT name FROM translators;
Grouping & Aggregation: GROUP BY & HAVING
Aggregate multiple rows per group. HAVING
filters groups (after aggregation); WHERE
filters rows (before).
Average rating per book, only 4.0+
SELECT
r.book_id,
ROUND(AVG(r.rating), 2) AS avg_rating
FROM ratings r
GROUP BY r.book_id
HAVING AVG(r.rating) >= 4.0
ORDER BY avg_rating DESC;
Count of ratings per book
SELECT
r.book_id,
COUNT(*) AS ratings_count
FROM ratings r
GROUP BY r.book_id
ORDER BY ratings_count DESC;
Top-N with ties (PostgreSQL)
SELECT title, rating
FROM book_scores
ORDER BY rating DESC
FETCH FIRST 10 ROWS WITH TIES;
Style Tips
-
Case: UPPERCASE SQL keywords (
SELECT
,WHERE
), lowercase identifiers (table
,column
). - Quoting: Single quotes for string literals; double quotes for identifiers if needed.
- Formatting: Newlines + 2–4 space indents for subqueries and JOIN clauses.
-
Aliases: Short table aliases (
b
,a
,p
) improve readability. -
NULL-aware: Use
IS NULL
/IS NOT NULL
(not= NULL
).
Cheat Sheet
-- Subquery (single value)
... WHERE col = (SELECT id FROM ... WHERE ...);
-- Subquery (multiple values)
... WHERE col IN (SELECT id FROM ... WHERE ...);
-- Inner join
SELECT ... FROM t1
JOIN t2 ON t2.fk = t1.pk;
-- Left join
SELECT ... FROM t1
LEFT JOIN t2 ON t2.fk = t1.pk;
-- Sets
(SELECT ...)
UNION / INTERSECT / EXCEPT
(SELECT ...);
-- Grouping
SELECT key, AVG(val) AS avg_val
FROM t
GROUP BY key
HAVING AVG(val) > 0;
-- Order/Limits
ORDER BY col DESC
LIMIT 10;
Top comments (0)