DEV Community

Cover image for CS50 SQL — Lecture 5 Notes & Guide: **Optimization, Indexes, and Concurrency**
Md Enayetur Rahman
Md Enayetur Rahman

Posted on

CS50 SQL — Lecture 5 Notes & Guide: **Optimization, Indexes, and Concurrency**

Table of Contents


Why Optimize?

  • Your app may execute millions of queries. Shaving 80 ms → 10 ms quickly compounds into user-perceived snappiness and lower compute costs.
  • Typical goals:
    • Reduce runtime (fewer rows touched; smarter access paths).
    • Reduce storage (smaller DB files; less I/O).
    • Handle concurrency safely (correctness when many clients hit the DB at once).

In lecture, we explored an IMDb-like SQLite DB with:

  • 400k+ movies, 1.4M people, 200k+ ratings, and a stars junction table for many-to-many relationships.

Measuring Query Time & Plans

Timing (SQLite shell):

.timer on
SELECT * FROM "movies" WHERE "title" = 'Cars';
.timer off
Enter fullscreen mode Exit fullscreen mode

See the access plan your DB intends to use:

EXPLAIN QUERY PLAN
SELECT * FROM "movies" WHERE "title" = 'Cars';
Enter fullscreen mode Exit fullscreen mode

Look for hints such as SCAN (full table scan) vs SEARCH ... USING INDEX ....


Table Scans vs. Indexes

  • Table scan: DB reads many (or all) rows, testing the predicate (title = 'Cars'). Works but scales poorly.
  • Index: a separate, sorted structure that maps key(s) → row locations. Queries can do logarithmic lookups instead of reading everything.

Create a simple index on movies(title):

CREATE INDEX IF NOT EXISTS title_index ON movies(title);
Enter fullscreen mode Exit fullscreen mode

Re-run your query and the plan. You should now see USING INDEX title_index with a large runtime drop.

💡 Indexes help most when a predicate is selective (returns a small fraction of rows).


Building Better Indexes

Single-column indexes

Good for frequent filters/joins:

CREATE INDEX IF NOT EXISTS people_name_index ON people(name);
CREATE INDEX IF NOT EXISTS movies_year_index  ON movies(year);
Enter fullscreen mode Exit fullscreen mode

Explaining & verifying usage

EXPLAIN QUERY PLAN
SELECT title
FROM movies
WHERE id IN (
  SELECT movie_id
  FROM stars
  WHERE person_id = (
    SELECT id FROM people WHERE name = 'Tom Hanks'
  )
);
Enter fullscreen mode Exit fullscreen mode
  • Without indexes, expect multiple SCAN steps.
  • With indexes on people(name) and stars(person_id), the plan should show SEARCH using those indexes.

Composite indexes & covering indexes

A composite index includes multiple columns in a defined order. A covering index contains all columns a subquery needs, letting SQLite avoid extra table lookups.

Example: accelerate the inner lookup for Tom Hanks’ filmography:

-- (Re)create composite index that can *cover* person_id → movie_id lookups
DROP INDEX IF EXISTS stars_person_index;
CREATE INDEX stars_person_index ON stars(person_id, movie_id);

-- People name search
CREATE INDEX IF NOT EXISTS people_name_index ON people(name);
Enter fullscreen mode Exit fullscreen mode

Now the plan can show USING COVERING INDEX, since movie_id is in the index leaf itself.

📌 Column order matters. Put equality predicates first. For queries like WHERE person_id = ? then reading movie_id, (person_id, movie_id) is a great fit.

Partial indexes

Index only the rows you filter on frequently—great when space matters or your workload is skewed.

-- Only index 2023 releases for faster “recent” searches
CREATE INDEX IF NOT EXISTS movies_recent_2023
ON movies(title)
WHERE year = 2023;

EXPLAIN QUERY PLAN
SELECT title FROM movies WHERE year = 2023;   -- can use movies_recent_2023

EXPLAIN QUERY PLAN
SELECT title FROM movies WHERE year = 1998;   -- cannot use the partial index
Enter fullscreen mode Exit fullscreen mode

Under the Hood: B-Trees in 5 minutes

  • Indexes are stored as B-Trees (balanced trees). Think “book index”: jump to the right page range, then refine.
  • Nodes contain sorted keys (e.g., titles) and pointers to children. Leaves store key + row location (or the columns themselves for covering indexes).
  • Search is logarithmic: at each step, choose the subtree that must contain the key.

Trade-offs:

  • Pros: very fast lookups/joins/sorts on indexed columns.
  • Cons: extra disk space; slower writes (DB maintains the tree on INSERT/UPDATE/DELETE).

Don’t index everything. Index what you filter or join on frequently, and keep write costs in mind.


Storage & Space: VACUUM and Cleanup

When you DROP INDEX or delete many rows, SQLite marks pages as reusable but doesn’t shrink the file immediately.

  • Reclaim space with:
VACUUM;
Enter fullscreen mode Exit fullscreen mode
  • This compacts the file and returns unused pages to the OS.
  • Verify size at the shell (not SQL):
du -b movies.db
Enter fullscreen mode Exit fullscreen mode

Use VACUUM after large deletions / index drops, or as part of periodic maintenance.


Concurrency & Transactions

ACID in practice

  • Atomicity: a transaction happens all or nothing.
  • Consistency: constraints are respected (e.g., CHECK (balance >= 0)).
  • Isolation: concurrent transactions don’t step on each other’s toes.
  • Durability: once committed, data survives crashes.

Begin, Commit, Rollback

Bank demo (Alice pays Bob $10), with atomic transfer:

BEGIN TRANSACTION;

UPDATE accounts
SET balance = balance + 10
WHERE id = 2;  -- Bob

UPDATE accounts
SET balance = balance - 10
WHERE id = 1;  -- Alice

COMMIT;  -- both changes become visible at once
Enter fullscreen mode Exit fullscreen mode

If a constraint fails (e.g., Alice would go negative), rollback:

BEGIN TRANSACTION;

UPDATE accounts SET balance = balance + 10 WHERE id = 2;
UPDATE accounts SET balance = balance - 10 WHERE id = 1;

-- Suppose the second UPDATE fails: revert everything
ROLLBACK;
Enter fullscreen mode Exit fullscreen mode

Observation from lecture: during the transaction, another session that reads accounts does not see the intermediate state (Bob +$10 while Alice hasn’t been debited yet). Only after COMMIT are both updates visible together.

Race conditions & isolation

A classic attack: two concurrent transfers that each check a balance before the other debits, leading to double-spend if not isolated. Solution: process transfers as transactions and ensure isolation so they occur sequentially from the database’s perspective.

  • First transfer commits (Alice +$30, Charlie −$30).
  • Second transfer reads the updated balance (Charlie now $0) and correctly fails or rolls back.
  • Only then can Alice attempt an ATM withdrawal.

Locks in SQLite

SQLite uses locks to coordinate access:

  • Shared lock: multiple readers allowed concurrently.
  • Exclusive lock: a writer needs exclusive access; others must wait.

You can see “database is locked” errors when an exclusive transaction is in progress.

-- Forces an exclusive write transaction
BEGIN EXCLUSIVE TRANSACTION;
-- ... do writes
COMMIT;
Enter fullscreen mode Exit fullscreen mode

Locks can be at the database level in SQLite. Other DBMSs may lock at coarser or finer granularities (table/page/row) depending on engine and settings.


Practical Checklists

Indexing checklist

  • ❑ Do I frequently filter/join on this column (or pair of columns)?
  • ❑ Is the predicate selective (returns a small fraction of rows)?
  • ❑ Do I need a composite index? Put equality terms first.
  • ❑ Could a covering index eliminate extra table lookups?
  • ❑ Would a partial index save space while serving hot queries?
  • ❑ What’s the expected write rate? (Heavy writes + many indexes = slower).

Query tuning checklist

  • ❑ Measure: .timer on and EXPLAIN QUERY PLAN.
  • ❑ Reduce row work early (filter as soon as possible).
  • ❑ Avoid SELECT * in large joins—project only needed columns.
  • ❑ Watch for implicit type mismatches (e.g., comparing text ↔ integer).
  • ❑ Consider pre-computed/materialized views for heavy aggregates (if allowed).

Concurrency & correctness

  • ❑ Wrap multi-step operations in a transaction (BEGIN/COMMIT).
  • ❑ Use constraints (CHECK, UNIQUE, FOREIGN KEY) as safety nets.
  • ❑ Handle failures: ROLLBACK on errors or insufficient funds.
  • ❑ Be mindful of SQLite’s locking model; keep transactions short.

Space management

  • ❑ Drop obsolete indexes.
  • VACUUM after large deletions / index drops.
  • ❑ Periodically review which indexes are actually being used.

Handy Snippets & Commands

Find table definitions:

.schema
.schema movies
Enter fullscreen mode Exit fullscreen mode

List indexes for a table:

PRAGMA index_list('stars');
-- Then, see columns in an index:
PRAGMA index_info('stars_person_index');
Enter fullscreen mode Exit fullscreen mode

Timing & plan:

.timer on
EXPLAIN QUERY PLAN
SELECT title FROM movies WHERE title = 'Cars';
Enter fullscreen mode Exit fullscreen mode

Create/Drop indexes:

CREATE INDEX IF NOT EXISTS title_index ON movies(title);
DROP INDEX IF EXISTS title_index;
Enter fullscreen mode Exit fullscreen mode

Transactions:

BEGIN TRANSACTION;
-- statements
COMMIT;   -- or ROLLBACK;
Enter fullscreen mode Exit fullscreen mode

Force exclusive write (demo):

BEGIN EXCLUSIVE TRANSACTION;
-- ...writes...
COMMIT;
Enter fullscreen mode Exit fullscreen mode

Reclaim space:

VACUUM;
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

  • Indexes are your main lever for speed—use them carefully and verify with EXPLAIN QUERY PLAN.
  • Covering & partial indexes multiply gains when applied thoughtfully.
  • Transactions ensure correctness with ACID; locks enforce isolation.
  • VACUUM reclaims space after big structural changes.

Happy optimizing! 🚀

Top comments (0)