DEV Community

Slava Rozhnev
Slava Rozhnev

Posted on

Why not Sakila? Building a Modern SQL Learning Database for MariaDB

If you've taught SQL or learned it from a course, you've almost certainly met Sakila. The little DVD rental database has been the go-to sample schema for MySQL and MariaDB tutorials for nearly two decades. It's clean, well-normalized, and comes pre-loaded with enough data to write interesting queries.

But here's the thing: Sakila was designed in 2006 for MySQL 5.0.

A lot has changed since then.


What Sakila can't show you

Open the Sakila schema today and count the column types: INT, VARCHAR, TEXT, ENUM, DATETIME, DECIMAL, TINYINT. That's it. No JSON. No FULLTEXT search beyond a basic demo. No SET type. And absolutely no VECTOR.

Meanwhile, MariaDB 11.7 ships with:

  • Native VECTOR(N) type for AI-era similarity search
  • Rich JSON functions (JSON_TABLE, JSON_VALUE, JSON_EXTRACT)
  • Window functions (RANK, LAG, LEAD, running aggregates)
  • Recursive CTEs
  • FULLTEXT with boolean mode and relevance scoring

Teaching someone SQL with Sakila in 2026 is like teaching someone to drive in a car with no GPS, no reversing camera, and a manual choke. The fundamentals still apply — but they're missing a huge chunk of what the tool can actually do.

There's also the domain problem. DVD rentals. If you're under 35, you may never have set foot in a video rental shop. The mental model is unfamiliar and the business rules feel arbitrary. What's the difference between a film, an inventory item, and a rental? Why does payment exist independently of rental? Explaining the schema takes time that should go into explaining SQL.


The domain that everyone understands

I needed something universal. Something with:

  • Obvious entities and relationships
  • Multiple natural hierarchies (good for recursive CTEs)
  • A mix of small lookup tables and large transactional tables
  • A clear reason for JSON (semi-structured data)
  • An excuse to use vectors (semantic search is everywhere now)

A university fits perfectly. Students, courses, faculty, enrollments, grades — everyone has lived inside this system. The relationships are intuitive. And the domain naturally produces the data shapes I needed:

Need University equivalent
Big analytic table grade_events — 120 000+ rows of scored items
Hierarchy for recursive CTE Departments (Faculty → Department → Sub-dept) and course prerequisites
JSON for semi-structured data Faculty office hours, student emergency contacts, grant funding details
FULLTEXT search Course descriptions, publication abstracts
VECTOR search Course semantic embeddings for "find similar courses"
Audit trail Every enrollment and grade change logged with JSON diffs

What University DB looks like

The schema has 16 tables in four tiers:

Lookup (tiny, < 100 rows): semesters, rooms, scholarships

Domain (small, up to 2 000 rows): departments, faculty, students, courses, course_prerequisites, sections

Transactional (medium): enrollments, student_scholarships, research_projects, publications, project_members

Analytic (big): grade_events (~120k rows) and audit_log (~60k rows, populated by triggers)

Every significant MariaDB datatype appears at least once:

-- VECTOR on courses — semantic embeddings for similarity search
embedding  VECTOR(1536) NULL

-- JSON on faculty — semi-structured office hours
office_hours JSON NULL
-- [{"day":"Mon","start":"10:00","end":"12:00"}, ...]

-- SET on publications — multi-value keyword tags
keywords SET('AI','ML','Databases','Security','Bioinformatics', ...) 

-- FULLTEXT on courses and publications
FULLTEXT KEY ft_course (title, description)
Enter fullscreen mode Exit fullscreen mode

The schema ships with 7 views, 6 stored procedures, and 7 triggers — including one that blocks enrollment when a section is full, and three that write JSON diffs to the audit_log table.


Four levels of example queries

I structured the example queries into four files so the database works for everyone from first-day learners to DBAs:

Level 1 — Basics: SELECT, WHERE, GROUP BY, single-table aggregation

Level 2 — Intermediate: multi-table JOIN (up to 5 tables), correlated subqueries, FULLTEXT search, JSON_VALUE

Level 3 — Advanced: window functions, CTEs, recursive CTEs, JSON_TABLE, SET/FIND_IN_SET, VEC_Distance similarity search

Level 4 — DBA/Developer: EXPLAIN ANALYZE, index strategy, stored procedure authoring, transaction isolation levels, audit log forensics

A level 3 query to find the full prerequisite chain for a course looks like this:

WITH RECURSIVE prereq_chain AS (
    SELECT cp.prerequisite_id,
           p.code   AS prereq_code,
           p.title  AS prereq_title,
           1        AS depth
    FROM   course_prerequisites cp
    JOIN   courses p ON p.course_id = cp.prerequisite_id
    WHERE  cp.course_id = (SELECT course_id FROM courses WHERE code = 'CS300')

    UNION ALL

    SELECT cp2.prerequisite_id,
           p2.code,
           p2.title,
           pc.depth + 1
    FROM   course_prerequisites cp2
    JOIN   prereq_chain         pc ON pc.prerequisite_id = cp2.course_id
    JOIN   courses              p2 ON p2.course_id = cp2.prerequisite_id
    WHERE  pc.depth < 10
)
SELECT DISTINCT depth, prereq_code, prereq_title
FROM   prereq_chain
ORDER  BY depth, prereq_code;
Enter fullscreen mode Exit fullscreen mode

And a vector similarity search to find courses related to a given one:

SELECT c.code, c.title,
       VEC_Distance(ref.embedding, c.embedding) AS distance
FROM   courses ref
JOIN   courses c ON c.course_id <> ref.course_id
WHERE  ref.code = 'CS101'
ORDER  BY distance
LIMIT  5;
Enter fullscreen mode Exit fullscreen mode

Try it right now

You don't need to install anything to explore the schema.

You can run queries against University DB directly in sqlize.online — my online SQL editor that supports MariaDB 11.7. Paste any query from the example files and see results immediately.

If you want a more structured learning experience with exercises and instant feedback, check out sqltest.online — designed for exactly this kind of hands-on SQL practice.


Get the database

Everything is open-source under the MIT license:

👉 github.com/rozhnev/university-db

The repository includes:

  • 01_schema.sql — all 16 tables
  • 02_objects.sql — views, procedures, triggers
  • 03_seed_small.sql — static seed data
  • generate_data.py — Python/Faker script to populate ~130 000 rows
  • docker-compose.yml — one command to get a running database
  • queries/level1.sql through level4.sql — 50+ example queries
git clone https://github.com/rozhnev/university-db.git
cd university-db
cp .env.example .env
docker compose up --build
Enter fullscreen mode Exit fullscreen mode

How to contribute

The project is open source and contributions of any kind are welcome.

Found a bug?Open an issue on GitHub. Mistakes in the schema, incorrect queries, typos in comments — all worth reporting. The more specific the description, the faster it gets fixed.

Have an interesting query? — Send a pull request to queries/. Level 3–4 examples are especially valuable: window functions, recursive CTEs, JSON forensics, vector search.

Want to improve the data generator?generate_data.py is intentionally kept simple. More realistic grade distributions, additional data scenarios, or faster bulk-insert batching are all good targets.

Need support for another environment? — A Kubernetes manifest, Helm chart, or a setup script for a cloud managed service (RDS, Cloud SQL, PlanetScale) would be a useful addition.

Educational materials? — Exercises with solutions, workshop slides, or Jupyter notebooks built on this schema are all welcome.

Fork the repository and send a pull request — code review within a few days.


Is Sakila dead?

Not at all. It's still a perfectly valid database for learning basic SQL, and its portability (it runs on any MySQL 5.x+ install) is a genuine advantage. But as a primary teaching tool for modern MariaDB, it's showing its age.

University DB fills the gap for anyone who wants to teach or learn the full surface area of what MariaDB 11.7 can do — from a first SELECT to a vector similarity search in a recursive CTE inside a stored procedure.

I hope it's useful. Feedback, issues, and pull requests are very welcome.


Slava Rozhnev — sqlize.online · sqltest.online · GitHub

Top comments (0)