Md Enayetur Rahman

Posted on Oct 18

CS50 SQL — Lecture 5 Notes & Guide: Optimization, Indexes, and Concurrency

CS50 SQL — Lecture 5 Notes & Guide: Optimization, Indexes, and Concurrency

Why Optimize?

Your app may execute millions of queries. Shaving 80 ms → 10 ms quickly compounds into user-perceived snappiness and lower compute costs.
Typical goals:
- Reduce runtime (fewer rows touched; smarter access paths).
- Reduce storage (smaller DB files; less I/O).
- Handle concurrency safely (correctness when many clients hit the DB at once).

In lecture, we explored an IMDb-like SQLite DB with:

400k+ movies, 1.4M people, 200k+ ratings, and a stars junction table for many-to-many relationships.

Measuring Query Time & Plans

Timing (SQLite shell):

.timer on
SELECT * FROM "movies" WHERE "title" = 'Cars';
.timer off

See the access plan your DB intends to use:

EXPLAIN QUERY PLAN
SELECT * FROM "movies" WHERE "title" = 'Cars';

Look for hints such as SCAN (full table scan) vs SEARCH ... USING INDEX ....

Table Scans vs. Indexes

Table scan: DB reads many (or all) rows, testing the predicate (title = 'Cars'). Works but scales poorly.
Index: a separate, sorted structure that maps key(s) → row locations. Queries can do logarithmic lookups instead of reading everything.

Create a simple index on movies(title):

CREATE INDEX IF NOT EXISTS title_index ON movies(title);

Re-run your query and the plan. You should now see USING INDEX title_index with a large runtime drop.

💡 Indexes help most when a predicate is selective (returns a small fraction of rows).

Building Better Indexes

Single-column indexes

Good for frequent filters/joins:

CREATE INDEX IF NOT EXISTS people_name_index ON people(name);
CREATE INDEX IF NOT EXISTS movies_year_index  ON movies(year);

Explaining & verifying usage

EXPLAIN QUERY PLAN
SELECT title
FROM movies
WHERE id IN (
  SELECT movie_id
  FROM stars
  WHERE person_id = (
    SELECT id FROM people WHERE name = 'Tom Hanks'
  )
);

Without indexes, expect multiple SCAN steps.
With indexes on people(name) and stars(person_id), the plan should show SEARCH using those indexes.

Composite indexes & covering indexes

A composite index includes multiple columns in a defined order. A covering index contains all columns a subquery needs, letting SQLite avoid extra table lookups.

Example: accelerate the inner lookup for Tom Hanks’ filmography:

-- (Re)create composite index that can *cover* person_id → movie_id lookups
DROP INDEX IF EXISTS stars_person_index;
CREATE INDEX stars_person_index ON stars(person_id, movie_id);

-- People name search
CREATE INDEX IF NOT EXISTS people_name_index ON people(name);

Now the plan can show USING COVERING INDEX, since movie_id is in the index leaf itself.

📌 Column order matters. Put equality predicates first. For queries like WHERE person_id = ? then reading movie_id, (person_id, movie_id) is a great fit.

Partial indexes

Index only the rows you filter on frequently—great when space matters or your workload is skewed.

-- Only index 2023 releases for faster “recent” searches
CREATE INDEX IF NOT EXISTS movies_recent_2023
ON movies(title)
WHERE year = 2023;

EXPLAIN QUERY PLAN
SELECT title FROM movies WHERE year = 2023;   -- can use movies_recent_2023

EXPLAIN QUERY PLAN
SELECT title FROM movies WHERE year = 1998;   -- cannot use the partial index

Under the Hood: B-Trees in 5 minutes

Indexes are stored as B-Trees (balanced trees). Think “book index”: jump to the right page range, then refine.
Nodes contain sorted keys (e.g., titles) and pointers to children. Leaves store key + row location (or the columns themselves for covering indexes).
Search is logarithmic: at each step, choose the subtree that must contain the key.

Trade-offs:

Pros: very fast lookups/joins/sorts on indexed columns.
Cons: extra disk space; slower writes (DB maintains the tree on INSERT/UPDATE/DELETE).

Don’t index everything. Index what you filter or join on frequently, and keep write costs in mind.

Storage & Space: `VACUUM` and Cleanup

When you DROP INDEX or delete many rows, SQLite marks pages as reusable but doesn’t shrink the file immediately.

Reclaim space with:

VACUUM;

This compacts the file and returns unused pages to the OS.
Verify size at the shell (not SQL):

du -b movies.db

Use VACUUM after large deletions / index drops, or as part of periodic maintenance.

Concurrency & Transactions

ACID in practice

Atomicity: a transaction happens all or nothing.
Consistency: constraints are respected (e.g., CHECK (balance >= 0)).
Isolation: concurrent transactions don’t step on each other’s toes.
Durability: once committed, data survives crashes.

Begin, Commit, Rollback

Bank demo (Alice pays Bob $10), with atomic transfer:

BEGIN TRANSACTION;

UPDATE accounts
SET balance = balance + 10
WHERE id = 2;  -- Bob

UPDATE accounts
SET balance = balance - 10
WHERE id = 1;  -- Alice

COMMIT;  -- both changes become visible at once

If a constraint fails (e.g., Alice would go negative), rollback:

BEGIN TRANSACTION;

UPDATE accounts SET balance = balance + 10 WHERE id = 2;
UPDATE accounts SET balance = balance - 10 WHERE id = 1;

-- Suppose the second UPDATE fails: revert everything
ROLLBACK;

Observation from lecture: during the transaction, another session that reads accounts does not see the intermediate state (Bob +$10 while Alice hasn’t been debited yet). Only after COMMIT are both updates visible together.

Race conditions & isolation

A classic attack: two concurrent transfers that each check a balance before the other debits, leading to double-spend if not isolated. Solution: process transfers as transactions and ensure isolation so they occur sequentially from the database’s perspective.

First transfer commits (Alice +$30, Charlie −$30).
Second transfer reads the updated balance (Charlie now $0) and correctly fails or rolls back.
Only then can Alice attempt an ATM withdrawal.

Locks in SQLite

SQLite uses locks to coordinate access:

Shared lock: multiple readers allowed concurrently.
Exclusive lock: a writer needs exclusive access; others must wait.

You can see “database is locked” errors when an exclusive transaction is in progress.

-- Forces an exclusive write transaction
BEGIN EXCLUSIVE TRANSACTION;
-- ... do writes
COMMIT;

Locks can be at the database level in SQLite. Other DBMSs may lock at coarser or finer granularities (table/page/row) depending on engine and settings.

Practical Checklists

Indexing checklist

❑ Do I frequently filter/join on this column (or pair of columns)?
❑ Is the predicate selective (returns a small fraction of rows)?
❑ Do I need a composite index? Put equality terms first.
❑ Could a covering index eliminate extra table lookups?
❑ Would a partial index save space while serving hot queries?
❑ What’s the expected write rate? (Heavy writes + many indexes = slower).

Query tuning checklist

❑ Measure: .timer on and EXPLAIN QUERY PLAN.
❑ Reduce row work early (filter as soon as possible).
❑ Avoid SELECT * in large joins—project only needed columns.
❑ Watch for implicit type mismatches (e.g., comparing text ↔ integer).
❑ Consider pre-computed/materialized views for heavy aggregates (if allowed).

Concurrency & correctness

❑ Wrap multi-step operations in a transaction (BEGIN/COMMIT).
❑ Use constraints (CHECK, UNIQUE, FOREIGN KEY) as safety nets.
❑ Handle failures: ROLLBACK on errors or insufficient funds.
❑ Be mindful of SQLite’s locking model; keep transactions short.

Space management

❑ Drop obsolete indexes.
❑ VACUUM after large deletions / index drops.
❑ Periodically review which indexes are actually being used.

Handy Snippets & Commands

Find table definitions:

.schema
.schema movies

List indexes for a table:

PRAGMA index_list('stars');
-- Then, see columns in an index:
PRAGMA index_info('stars_person_index');

Timing & plan:

.timer on
EXPLAIN QUERY PLAN
SELECT title FROM movies WHERE title = 'Cars';

Create/Drop indexes:

CREATE INDEX IF NOT EXISTS title_index ON movies(title);
DROP INDEX IF EXISTS title_index;

Transactions:

BEGIN TRANSACTION;
-- statements
COMMIT;   -- or ROLLBACK;

Force exclusive write (demo):

BEGIN EXCLUSIVE TRANSACTION;
-- ...writes...
COMMIT;

Reclaim space:

VACUUM;

Key Takeaways

Indexes are your main lever for speed—use them carefully and verify with EXPLAIN QUERY PLAN.
Covering & partial indexes multiply gains when applied thoughtfully.
Transactions ensure correctness with ACID; locks enforce isolation.
VACUUM reclaims space after big structural changes.

Happy optimizing! 🚀

DEV Community

CS50 SQL — Lecture 5 Notes & Guide: Optimization, Indexes, and Concurrency

Table of Contents

Why Optimize?

Measuring Query Time & Plans

Table Scans vs. Indexes

Building Better Indexes

Single-column indexes

Explaining & verifying usage

Composite indexes & covering indexes

Partial indexes

Under the Hood: B-Trees in 5 minutes

Storage & Space: `VACUUM` and Cleanup

Concurrency & Transactions

ACID in practice

Begin, Commit, Rollback

Race conditions & isolation

Locks in SQLite

Practical Checklists

Indexing checklist

Query tuning checklist

Concurrency & correctness

Space management

Handy Snippets & Commands

Key Takeaways

Top comments (0)

Table of Contents

Why Optimize?

Measuring Query Time & Plans

Table Scans vs. Indexes

Building Better Indexes

Single-column indexes

Explaining & verifying usage

Composite indexes & covering indexes

Partial indexes

Under the Hood: B-Trees in 5 minutes

Storage & Space: VACUUM and Cleanup

Concurrency & Transactions

ACID in practice

Begin, Commit, Rollback

Race conditions & isolation

Locks in SQLite

Practical Checklists

Indexing checklist

Query tuning checklist

Concurrency & correctness

Space management

Handy Snippets & Commands

Key Takeaways

Storage & Space: `VACUUM` and Cleanup