Table of Contents
-
CS50 SQL — Lecture 5 Notes & Guide: Optimization, Indexes, and Concurrency
- Table of Contents
- Why Optimize?
- Measuring Query Time & Plans
- Table Scans vs. Indexes
- Building Better Indexes
- Single-column indexes
- Explaining & verifying usage
- Composite indexes & covering indexes
- Partial indexes
- Under the Hood: B-Trees in 5 minutes
- Storage & Space:
VACUUMand Cleanup - Concurrency & Transactions
- ACID in practice
- Begin, Commit, Rollback
- Race conditions & isolation
- Locks in SQLite
- Practical Checklists
- Indexing checklist
- Query tuning checklist
- Concurrency & correctness
- Space management
- Handy Snippets & Commands
- Key Takeaways
Why Optimize?
- Your app may execute millions of queries. Shaving 80 ms → 10 ms quickly compounds into user-perceived snappiness and lower compute costs.
- Typical goals:
- Reduce runtime (fewer rows touched; smarter access paths).
- Reduce storage (smaller DB files; less I/O).
- Handle concurrency safely (correctness when many clients hit the DB at once).
In lecture, we explored an IMDb-like SQLite DB with:
-
400k+ movies, 1.4M people, 200k+ ratings, and a
starsjunction table for many-to-many relationships.
Measuring Query Time & Plans
Timing (SQLite shell):
.timer on
SELECT * FROM "movies" WHERE "title" = 'Cars';
.timer off
See the access plan your DB intends to use:
EXPLAIN QUERY PLAN
SELECT * FROM "movies" WHERE "title" = 'Cars';
Look for hints such as SCAN (full table scan) vs SEARCH ... USING INDEX ....
Table Scans vs. Indexes
-
Table scan: DB reads many (or all) rows, testing the predicate (
title = 'Cars'). Works but scales poorly. - Index: a separate, sorted structure that maps key(s) → row locations. Queries can do logarithmic lookups instead of reading everything.
Create a simple index on movies(title):
CREATE INDEX IF NOT EXISTS title_index ON movies(title);
Re-run your query and the plan. You should now see USING INDEX title_index with a large runtime drop.
💡 Indexes help most when a predicate is selective (returns a small fraction of rows).
Building Better Indexes
Single-column indexes
Good for frequent filters/joins:
CREATE INDEX IF NOT EXISTS people_name_index ON people(name);
CREATE INDEX IF NOT EXISTS movies_year_index ON movies(year);
Explaining & verifying usage
EXPLAIN QUERY PLAN
SELECT title
FROM movies
WHERE id IN (
SELECT movie_id
FROM stars
WHERE person_id = (
SELECT id FROM people WHERE name = 'Tom Hanks'
)
);
- Without indexes, expect multiple
SCANsteps. - With indexes on
people(name)andstars(person_id), the plan should showSEARCHusing those indexes.
Composite indexes & covering indexes
A composite index includes multiple columns in a defined order. A covering index contains all columns a subquery needs, letting SQLite avoid extra table lookups.
Example: accelerate the inner lookup for Tom Hanks’ filmography:
-- (Re)create composite index that can *cover* person_id → movie_id lookups
DROP INDEX IF EXISTS stars_person_index;
CREATE INDEX stars_person_index ON stars(person_id, movie_id);
-- People name search
CREATE INDEX IF NOT EXISTS people_name_index ON people(name);
Now the plan can show USING COVERING INDEX, since movie_id is in the index leaf itself.
📌 Column order matters. Put equality predicates first. For queries like
WHERE person_id = ?then readingmovie_id,(person_id, movie_id)is a great fit.
Partial indexes
Index only the rows you filter on frequently—great when space matters or your workload is skewed.
-- Only index 2023 releases for faster “recent” searches
CREATE INDEX IF NOT EXISTS movies_recent_2023
ON movies(title)
WHERE year = 2023;
EXPLAIN QUERY PLAN
SELECT title FROM movies WHERE year = 2023; -- can use movies_recent_2023
EXPLAIN QUERY PLAN
SELECT title FROM movies WHERE year = 1998; -- cannot use the partial index
Under the Hood: B-Trees in 5 minutes
- Indexes are stored as B-Trees (balanced trees). Think “book index”: jump to the right page range, then refine.
- Nodes contain sorted keys (e.g., titles) and pointers to children. Leaves store key + row location (or the columns themselves for covering indexes).
- Search is logarithmic: at each step, choose the subtree that must contain the key.
Trade-offs:
- Pros: very fast lookups/joins/sorts on indexed columns.
- Cons: extra disk space; slower writes (DB maintains the tree on INSERT/UPDATE/DELETE).
Don’t index everything. Index what you filter or join on frequently, and keep write costs in mind.
Storage & Space: VACUUM and Cleanup
When you DROP INDEX or delete many rows, SQLite marks pages as reusable but doesn’t shrink the file immediately.
- Reclaim space with:
VACUUM;
- This compacts the file and returns unused pages to the OS.
- Verify size at the shell (not SQL):
du -b movies.db
Use VACUUM after large deletions / index drops, or as part of periodic maintenance.
Concurrency & Transactions
ACID in practice
- Atomicity: a transaction happens all or nothing.
-
Consistency: constraints are respected (e.g.,
CHECK (balance >= 0)). - Isolation: concurrent transactions don’t step on each other’s toes.
- Durability: once committed, data survives crashes.
Begin, Commit, Rollback
Bank demo (Alice pays Bob $10), with atomic transfer:
BEGIN TRANSACTION;
UPDATE accounts
SET balance = balance + 10
WHERE id = 2; -- Bob
UPDATE accounts
SET balance = balance - 10
WHERE id = 1; -- Alice
COMMIT; -- both changes become visible at once
If a constraint fails (e.g., Alice would go negative), rollback:
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance + 10 WHERE id = 2;
UPDATE accounts SET balance = balance - 10 WHERE id = 1;
-- Suppose the second UPDATE fails: revert everything
ROLLBACK;
Observation from lecture: during the transaction, another session that reads accounts does not see the intermediate state (Bob +$10 while Alice hasn’t been debited yet). Only after COMMIT are both updates visible together.
Race conditions & isolation
A classic attack: two concurrent transfers that each check a balance before the other debits, leading to double-spend if not isolated. Solution: process transfers as transactions and ensure isolation so they occur sequentially from the database’s perspective.
- First transfer commits (Alice +$30, Charlie −$30).
- Second transfer reads the updated balance (Charlie now $0) and correctly fails or rolls back.
- Only then can Alice attempt an ATM withdrawal.
Locks in SQLite
SQLite uses locks to coordinate access:
- Shared lock: multiple readers allowed concurrently.
- Exclusive lock: a writer needs exclusive access; others must wait.
You can see “database is locked” errors when an exclusive transaction is in progress.
-- Forces an exclusive write transaction
BEGIN EXCLUSIVE TRANSACTION;
-- ... do writes
COMMIT;
Locks can be at the database level in SQLite. Other DBMSs may lock at coarser or finer granularities (table/page/row) depending on engine and settings.
Practical Checklists
Indexing checklist
- ❑ Do I frequently filter/join on this column (or pair of columns)?
- ❑ Is the predicate selective (returns a small fraction of rows)?
- ❑ Do I need a composite index? Put equality terms first.
- ❑ Could a covering index eliminate extra table lookups?
- ❑ Would a partial index save space while serving hot queries?
- ❑ What’s the expected write rate? (Heavy writes + many indexes = slower).
Query tuning checklist
- ❑ Measure:
.timer onandEXPLAIN QUERY PLAN. - ❑ Reduce row work early (filter as soon as possible).
- ❑ Avoid
SELECT *in large joins—project only needed columns. - ❑ Watch for implicit type mismatches (e.g., comparing text ↔ integer).
- ❑ Consider pre-computed/materialized views for heavy aggregates (if allowed).
Concurrency & correctness
- ❑ Wrap multi-step operations in a transaction (
BEGIN/COMMIT). - ❑ Use constraints (
CHECK,UNIQUE,FOREIGN KEY) as safety nets. - ❑ Handle failures:
ROLLBACKon errors or insufficient funds. - ❑ Be mindful of SQLite’s locking model; keep transactions short.
Space management
- ❑ Drop obsolete indexes.
- ❑
VACUUMafter large deletions / index drops. - ❑ Periodically review which indexes are actually being used.
Handy Snippets & Commands
Find table definitions:
.schema
.schema movies
List indexes for a table:
PRAGMA index_list('stars');
-- Then, see columns in an index:
PRAGMA index_info('stars_person_index');
Timing & plan:
.timer on
EXPLAIN QUERY PLAN
SELECT title FROM movies WHERE title = 'Cars';
Create/Drop indexes:
CREATE INDEX IF NOT EXISTS title_index ON movies(title);
DROP INDEX IF EXISTS title_index;
Transactions:
BEGIN TRANSACTION;
-- statements
COMMIT; -- or ROLLBACK;
Force exclusive write (demo):
BEGIN EXCLUSIVE TRANSACTION;
-- ...writes...
COMMIT;
Reclaim space:
VACUUM;
Key Takeaways
-
Indexes are your main lever for speed—use them carefully and verify with
EXPLAIN QUERY PLAN. - Covering & partial indexes multiply gains when applied thoughtfully.
- Transactions ensure correctness with ACID; locks enforce isolation.
- VACUUM reclaims space after big structural changes.
Happy optimizing! 🚀
Top comments (0)