DEV Community

Cover image for Designing Relational Schemas in SQLite — CS50 SQL Lecture 2 (Deep Notes + Examples)
Md Enayetur Rahman
Md Enayetur Rahman

Posted on

Designing Relational Schemas in SQLite — CS50 SQL Lecture 2 (Deep Notes + Examples)

Table of Contents


Why Schema Design Matters

Your schema is the shape of your data. Good schemas reduce duplication, keep writes and reads reliable, and make “the right” queries easy to express. In this lecture, we move from using schemas to designing them: declaring tables, picking types, defining keys, and applying constraints that enforce the rules of your domain.


Peeking Under the Hood: .schema and Friends

SQLite gives you a quick way to inspect the structure of a database.

# Open a database
sqlite3 path/to/db.sqlite3

# Show the CREATE statements for the whole DB
.schema

# Or for just one table
.schema books

# Pretty-print query results
.headers on
.mode table
Enter fullscreen mode Exit fullscreen mode

You can also quickly preview a table:

SELECT author, title
FROM longlist
LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

Modeling the Real World: Entities & Relationships

Typical entities you’ll meet:

  • Thing: e.g., Book, Author, Station, Card
  • Relationship: e.g., Authorship, Visit/Swipe

A relationship can be:

  • One-to-many (a publisher → many books)
  • Many-to-many (authors ↔ books; riders ↔ stations via visits/swipes)

We represent many-to-many with a junction (aka join/associative) table:

Riders (id PK, name ...)
Stations (id PK, name, line ...)
Visits (id PK, rider_id FK → Riders.id, station_id FK → Stations.id)
Enter fullscreen mode Exit fullscreen mode

Normalization (1NF → 3NF) — A Beginner‑Friendly, No‑Hand‑Waving Guide

Normalization reduces redundancy and prevents update anomalies. Below is a practical, example‑driven treatment using the “subway rides” story, rewritten to answer common beginner questions explicitly.


Start: A denormalized log

rides_log
--------------------------------------------------------------
| id | rider_name | station_name | action | fare | balance  |
|----|------------|--------------|--------|------|----------|
| 1  | Charlie    | Kendall/MIT  | enter  | 0.10 | 0.05     |
| 2  | Charlie    | Jamaica Plain| exit   | 0.05 | 0.00     |
| 3  | Alice      | Harvard      | enter  | 0.10 | 0.20     |
| 4  | Alice      | Park Street  | exit   | 0.05 | 0.15     |
Enter fullscreen mode Exit fullscreen mode

Problems

  • Rider names repeat; no way to distinguish two “Charlies.”
  • Station names repeat; “Park Street” string appears everywhere.
  • Updating a station’s metadata (e.g., line color) requires touching many rows.
  • Rider‑specific facts (e.g., email) don’t belong in a station row, and vice versa.

Key terms (plain English)

What is a multi‑valued field and why split it?

A column that stores more than one value at once (e.g., "Red,Blue" or ["Red","Blue"]). In 1NF, each column holds one value per row.

Fix: Move the repeated values to a separate table and connect via a junction table.

What are atomic columns / “each cell is atomic”?

“Atomic” means one value, one meaning, not a bundle. If you often need a part of a value, it isn’t atomic for your use:

  • full_name may be OK—unless you often need first_name and last_name separately.
  • Avoid lists/CSV/arrays/JSON blobs in a single cell.

What are entities vs facts?

  • Entities are the “nouns” you track over time (have identity): Rider, Station.
  • Facts are properties or events about entities: a visit/swipe that connects a rider and a station and records action, fare, balance_after.

1NF — First Normal Form (the “atomic values” rule)

Rule of thumb:

  • Every table has a primary key (unique identifier for each row).
  • Each cell is atomic (no multi‑valued fields, no lists).
  • No repeating groups inside a row.

From rides_log, split entities into their own tables and move the relationship into a fact table:

Riders   (id PK, name)
Stations (id PK, name, line)
Visits   (id PK, rider_id → Riders.id, station_id → Stations.id, action, fare, balance_after, ts)
Enter fullscreen mode Exit fullscreen mode

Now each column holds a single value, tables have stable keys, and rider/station strings aren’t duplicated all over the log.

Tip: If you do need sets (e.g., a rider’s favorite lines), use a junction table:

CREATE TABLE lines(
  id   INTEGER PRIMARY KEY,
  name TEXT NOT NULL UNIQUE
);
CREATE TABLE rider_lines(
  rider_id INTEGER NOT NULL REFERENCES riders(id),
  line_id  INTEGER NOT NULL REFERENCES lines(id),
  PRIMARY KEY (rider_id, line_id)
);

2NF — Second Normal Form (avoid partial dependency)

When it applies: your table’s primary key is composite (e.g., (rider_id, station_id)).

Rule: Every non‑key column must depend on the whole key—not just part of it.

Why our Visits needs its own id:

If Visits used PK (rider_id, station_id), attributes like action, fare, balance_after, ts describe a single event, not the pair itself. They don’t depend on the entire key; they depend on the visit (an occurrence).

Fix: Give Visits its own surrogate key id (or rely on SQLite’s rowid), so every other column depends on that one key.

CREATE TABLE visits (
  id          INTEGER PRIMARY KEY,
  rider_id    INTEGER NOT NULL REFERENCES riders(id),
  station_id  INTEGER NOT NULL REFERENCES stations(id),
  action      TEXT    NOT NULL CHECK (action IN ('enter','exit')),
  fare        NUMERIC NOT NULL CHECK (fare >= 0),
  balance_after NUMERIC NOT NULL,
  ts          TEXT    NOT NULL DEFAULT CURRENT_TIMESTAMP
);
Enter fullscreen mode Exit fullscreen mode

3NF — Third Normal Form (avoid transitive dependency)

Rule: Non‑key attributes must not depend on other non‑key attributes (only on the key).

Example trap: Putting station_line into Visits:

  • station_line depends on station_id (a non‑key attribute) which depends on visits.id (the key). That’s a transitive dependency.
  • Fix: Keep line solely in Stations and join when needed.
-- Good: fetch station metadata via JOIN
SELECT v.id, r.name AS rider, s.name AS station, s.line, v.action, v.fare, v.ts
FROM visits v
JOIN riders  r ON r.id = v.rider_id
JOIN stations s ON s.id = v.station_id;
Enter fullscreen mode Exit fullscreen mode

BCNF — Boyce–Codd Normal Form (intuitive view)

Stricter than 3NF. For every non‑trivial dependency X → Y, X must be a key.

If some non‑key attribute (or set) functionally determines others, you likely need to decompose.

Classic BCNF violation:

courses(course_id PK, instructor, room)
Business rule: each instructor always teaches in one fixed room.
So: instructor → room, but instructor is NOT a key  → violates BCNF.
Enter fullscreen mode Exit fullscreen mode

Decompose:

CREATE TABLE instructors(
  instructor TEXT PRIMARY KEY,
  room       TEXT NOT NULL
);
CREATE TABLE courses(
  course_id  INTEGER PRIMARY KEY,
  instructor TEXT NOT NULL REFERENCES instructors(instructor)
);
Enter fullscreen mode Exit fullscreen mode

Quick “am I normalized?” checklist

  • Entities live in their own tables and have keys.
  • Facts (events/relationships) live in junction/event tables.
  • 1NF: No lists in cells; every table has a key.
  • 2NF: With composite keys, every non‑key column depends on the whole key.
  • 3NF: Non‑key columns don’t depend on other non‑key columns.
  • BCNF (nice‑to‑have): If something determines other columns, it should be a key.

Red flags: CSV in a column, duplicate text everywhere, columns that can be derived from a FK target table, “shadow keys” (like station_name uniquely identifying a station but not declared as a key).


Minimal working SQL (SQLite) for the subway example

CREATE TABLE riders (
  id   INTEGER PRIMARY KEY,
  name TEXT NOT NULL
);

CREATE TABLE stations (
  id   INTEGER PRIMARY KEY,
  name TEXT NOT NULL UNIQUE,
  line TEXT NOT NULL
);

CREATE TABLE visits (
  id             INTEGER PRIMARY KEY,
  rider_id       INTEGER NOT NULL,
  station_id     INTEGER NOT NULL,
  action         TEXT    NOT NULL CHECK (action IN ('enter','exit')),
  fare           NUMERIC NOT NULL CHECK (fare >= 0),
  balance_after  NUMERIC NOT NULL,
  ts             TEXT    NOT NULL DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (rider_id)   REFERENCES riders(id),
  FOREIGN KEY (station_id) REFERENCES stations(id)
);
Enter fullscreen mode Exit fullscreen mode

Query (join facts with entities):

SELECT v.id, r.name AS rider, s.name AS station, s.line, v.action, v.fare, v.balance_after, v.ts
FROM visits v
JOIN riders  r ON r.id = v.rider_id
JOIN stations s ON s.id = v.station_id
ORDER BY v.ts;
Enter fullscreen mode Exit fullscreen mode

Note: In production MBTA-like systems, “riders” are often replaced by cards and visits become swipes. The same normalization logic applies.


TL;DR

  • 1NF: One value per cell; no lists; every table has a key.
  • 2NF: With composite keys, non‑key columns must depend on the whole key (or just give the table a surrogate id).
  • 3NF: Don’t keep attributes that are derivable from another table via a FK (avoid transitive dependencies).
  • BCNF: If a set of columns determines others, that set should be a key—otherwise, decompose.
  • Separate entities from facts, and join them when you need a complete picture.

SQLite Storage Classes vs. Type Affinities

Storage classes (per value):

  • NULL — absence of a value
  • INTEGER — whole numbers (1, 2, …; width chosen by SQLite)
  • REAL — floating-point
  • TEXT — strings
  • BLOB — raw bytes, stored exactly as given

Type affinities (per column):

  • TEXT, NUMERIC, INTEGER, REAL, BLOB

SQLite tries to coerce inserted values to the column’s affinity. Example:

-- TEXT affinity; values get stored as text
CREATE TABLE fares_text(amount TEXT);

INSERT INTO fares_text VALUES (10);   -- stored as "10"
INSERT INTO fares_text VALUES ('25'); -- stored as "25"

-- INTEGER affinity; numeric-looking text coerces to INTEGER
CREATE TABLE fares_int(amount INTEGER);

INSERT INTO fares_int VALUES ('10');  -- stored as 10 (integer)
INSERT INTO fares_int VALUES (25);    -- stored as 25
Enter fullscreen mode Exit fullscreen mode

Money values: Prefer storing cents as INTEGER (240 = $2.40) to avoid floating-point rounding issues. Convert on display.


Creating Tables

CREATE TABLE riders (
  id   INTEGER,
  name TEXT
);
Enter fullscreen mode Exit fullscreen mode

Better with constraints and types (see later sections), but start simple and iterate.

Pro tip: Maintain a schema.sql you can .read into SQLite to recreate your DB from scratch.

sqlite3 mbta.db
.read schema.sql
.schema
Enter fullscreen mode Exit fullscreen mode

Keys: Primary, Foreign, and Composite

Primary Keys (PK)

Uniquely identify rows. In SQLite, you can:

  • Declare an explicit PK:
  CREATE TABLE stations (
    id   INTEGER,
    name TEXT NOT NULL UNIQUE,
    line TEXT NOT NULL,
    PRIMARY KEY (id)
  );
Enter fullscreen mode Exit fullscreen mode
  • Or rely on the hidden rowid (SQLite’s implicit PK) — but explicit is clearer.

Foreign Keys (FK)

Reference a PK in another table. Enable enforcement in SQLite:

PRAGMA foreign_keys = ON;
Enter fullscreen mode Exit fullscreen mode

Example:

CREATE TABLE visits (
  id         INTEGER,
  rider_id   INTEGER,
  station_id INTEGER,
  PRIMARY KEY (id),
  FOREIGN KEY (rider_id)   REFERENCES riders(id),
  FOREIGN KEY (station_id) REFERENCES stations(id)
);
Enter fullscreen mode Exit fullscreen mode

Composite Keys

Sometimes the combination of columns is the key:

CREATE TABLE enrollment (
  student_id INTEGER,
  course_id  INTEGER,
  PRIMARY KEY (student_id, course_id)
);
Enter fullscreen mode Exit fullscreen mode

Use composite keys when the pair is truly unique and repeated events don’t make sense. If events repeat (e.g., multiple swipes at the same station), use a surrogate id instead.


Column & Table Constraints

  • NOT NULL — required column
  • UNIQUE — no duplicate values in the column
  • CHECK (...) — custom rules
  • DEFAULT <expr> — value to use when none provided
  • PRIMARY KEY (...), FOREIGN KEY (...) REFERENCES ... — table constraints

Examples:

-- Stations that *must* have a name and line; names must be unique
CREATE TABLE stations (
  id   INTEGER,
  name TEXT NOT NULL UNIQUE,
  line TEXT NOT NULL,
  PRIMARY KEY (id)
);

-- Visits with FKs
CREATE TABLE visits (
  id         INTEGER,
  rider_id   INTEGER,
  station_id INTEGER,
  action     TEXT CHECK (action IN ('enter','exit')),
  fare_cents INTEGER CHECK (fare_cents >= 0),
  PRIMARY KEY (id),
  FOREIGN KEY (rider_id)   REFERENCES riders(id),
  FOREIGN KEY (station_id) REFERENCES stations(id)
);
Enter fullscreen mode Exit fullscreen mode

DEFAULT + timestamps:

CREATE TABLE events (
  id        INTEGER PRIMARY KEY,
  happened  NUMERIC NOT NULL DEFAULT CURRENT_TIMESTAMP
);
Enter fullscreen mode Exit fullscreen mode

NUMERIC affinity works well for CURRENT_TIMESTAMP in SQLite.


Evolving a Schema: ALTER TABLE & DROP TABLE

You won’t get it perfect the first time. Change it safely:

-- Rename a table
ALTER TABLE visits RENAME TO swipes;

-- Add a column
ALTER TABLE swipes ADD COLUMN type TEXT;

-- Rename a column
ALTER TABLE swipes RENAME COLUMN type TO swipe_type;

-- Drop a column (SQLite ≥ 3.35.0)
ALTER TABLE swipes DROP COLUMN swipe_type;

-- Delete a table
DROP TABLE riders;
Enter fullscreen mode Exit fullscreen mode

Prefer editing schema.sql and rebuilding during early design; use ALTER TABLE for production migrations.


Design Gotchas & Practical Tips

  • Booleans in SQLite: No native BOOLEAN; use INTEGER with 0/1 or TEXT with 'true'/'false'.
  • Foreign keys are off by default: PRAGMA foreign_keys = ON; per connection.
  • Money: Store cents as INTEGER. Convert to dollars at the edges.
  • Timestamps: NUMERIC + DEFAULT CURRENT_TIMESTAMP is fine. If you need time zones, store UTC.
  • Composite vs. surrogate PKs: Use composite keys when the pair naturally identifies a row and repeats don’t happen. Otherwise, prefer a surrogate id.
  • Indexes: Add indexes on FKs/lookup columns when data grows:
  CREATE INDEX idx_swipes_card ON swipes(card_id);
  CREATE INDEX idx_swipes_station ON swipes(station_id);
Enter fullscreen mode Exit fullscreen mode
  • Data validation: Push domain rules into CHECKs where possible (e.g., sign of amount_cents based on type).
  • Migrations: Keep schema.sql under version control; add seed.sql for small demo data.
  • Modes: For nicer output while exploring:
  .headers on
  .mode table
Enter fullscreen mode Exit fullscreen mode

Cheat Sheet: Commands & Patterns

-- Inspect
.schema
.schema table_name

-- Create
CREATE TABLE name (...);

-- Keys & constraints (table-level)
PRIMARY KEY (col [, col2])
FOREIGN KEY (col) REFERENCES other(col)

-- Column constraints
NOT NULL
UNIQUE
CHECK (expr)
DEFAULT expr

-- Modify
ALTER TABLE t RENAME TO new_t;
ALTER TABLE t ADD COLUMN col TYPE [constraints];
ALTER TABLE t RENAME COLUMN old TO new;
ALTER TABLE t DROP COLUMN col;  -- requires SQLite 3.35+

DROP TABLE t;

-- Enable FK enforcement
PRAGMA foreign_keys = ON;
Enter fullscreen mode Exit fullscreen mode

Further Practice Ideas

  1. Extend cards: add a status column and a CHECK to block use when status='blocked'.
  2. Fare rules: Implement a peak/off-peak fare table; join in a trigger to pick the right fare at insert time.
  3. Transfers: Model free transfers within N minutes with a swipe_window rule.
  4. Reporting queries:
    • Top 10 busiest stations (by swipes) per day.
    • Average deposit size per month.
    • Cards with negative balance inserts (should be impossible if checks are correct!).

Top comments (0)