Athreya aka Maneshwar

Posted on Feb 4

Commit Operation: Turning In-Memory Changes into Permanent Reality in SQLite

#webdev #programming #database #architecture

Hello, I'm Maneshwar. I'm working on FreeDevTools online currently building "one place for all dev tools, cheat codes, and TLDRs" — a free, open-source hub where developers can quickly find and use tools without any hassle of searching all over the internet.

Up until commit time, everything SQLite does is deliberately reversible.

Pages are modified in cache.
Before-images sit safely in rollback journals.
Locks signal intent, not finality.

The commit operation is the point where SQLite stops being cautious and starts being decisive.

What’s interesting is that SQLite doesn’t use a single, universal commit protocol. The exact sequence depends on how many database files the transaction modified.

The Single-Database Commit: Simple, Strict, and Fast

Let’s start with the common case: a transaction that modifies exactly one database file.

Read-only transactions are trivial here. To commit a read transaction, the pager simply releases the shared lock (if no other readers or writers exist) and returns to NO_LOCK. The page cache remains intact, which means the next transaction benefits from a warm cache.

Write transactions are where things get serious.

When the tree module decides it’s time to commit, it calls two pager functions in order:

sqlite3PagerCommitPhaseOne
sqlite3PagerCommitPhaseTwo

Together, these two phases define SQLite’s durability protocol.

Phase One: Making the Database Durable

The first thing the pager must do is stop the world.

It attempts to acquire an EXCLUSIVE lock on the database file. If this fails, it means some other connection is still reading the database. In that case, the commit cannot proceed, and SQLite returns SQLITE_BUSY.

This is important: the transaction is not aborted. It stays active and can be retried later.

Once the exclusive lock is acquired, the pager:

Increments the database file-change counter
Writes all dirty pages from the page cache back to the database file

This writeback follows the pager’s disciplined flushing rules. All required journal information already exists, so the pager is free to overwrite database pages.

This step is often described as flush-log-at-commit. The journal already contains everything needed to undo the transaction if something goes wrong.

But there’s a catch.

Most operating systems buffer file writes in memory. Writing pages to the database file does not guarantee they’ve reached the disk platter.

So SQLite forces the issue.

The pager calls fsync on the database file, ensuring that all database writes are physically persisted. This step — flush-database-at-commit — removes the need for redo recovery after a system restart.

At this point, the database file on disk already contains the new data.

But the transaction is still not committed.

Phase Two: Declaring the Commit

The commit point comes from something deceptively simple.

The pager finalizes the rollback journal — by deleting it, truncating it, or invalidating its header.

This is the moment of truth.

SQLite defines the commit point as the instant the rollback journal is finalized. Before this moment, a crash causes rollback. After this moment, rollback is impossible — the transaction is committed.

SQLite assumes that journal finalization is an atomic operation at the OS level. That assumption underpins the entire design.

Once the journal is finalized:

The exclusive lock is released
The pager returns to SHARED_LOCK or NO_LOCK
The page cache remains valid and warm

The transaction is complete.

Multi-Database Commit: SQLite’s Mini Two-Phase Commit

Things get more interesting when a transaction modifies multiple database files.

At this point, SQLite behaves like a distributed database — even though everything is local.

The VM layer acts as the commit coordinator, while each pager performs its own local commit steps.

The goal is simple: all databases commit, or none do.

Coordinated Commit with a Master Journal

Here’s how SQLite achieves atomicity across multiple databases.

First, shared locks are released on databases that were not modified. Only the databases that were updated matter from this point on.

Next, SQLite acquires EXCLUSIVE locks on all modified databases and increments their file-change counters.

Then comes the key mechanism: the master journal.

SQLite creates a master journal file in the same directory as the main database. Its name ends with -mj followed by random hexadecimal digits. This file exists even if the main database itself was not modified.

The master journal contains:

The names of all individual rollback journal files involved in the transaction

SQLite flushes the master journal and the journal directory to disk. This step ensures that the coordination record itself is durable.

Each individual rollback journal is then updated with a reference to the master journal. These journals are flushed as well.

Only now does SQLite flush the actual database files.

At this point:

All databases contain new data
All journals describe how to undo it
The master journal ties them together

The final act is decisive.

SQLite deletes the master journal file and flushes the journal directory.

This deletion is the commit point for multi-database transactions.

Once the master journal is gone:

All individual rollback journals are finalized
Exclusive locks are released
All pagers return to SHARED_LOCK or NO_LOCK

If a crash happens before the master journal is deleted, SQLite will roll back all databases on next open. If the crash happens after deletion, the transaction is committed everywhere.

Journal Finalization Modes

One subtle detail worth calling out is journal finalization behavior.

If journal_mode is set to persist, SQLite doesn’t delete the journal file. Instead, it truncates it to zero length. From the pager’s point of view, this is just as effective — the journal is no longer usable for recovery.

Different mechanism, same semantic outcome.

What Happens When Commit Fails

A failed commit does not mean a failed transaction.

If COMMIT returns SQLITE_BUSY, it simply means another transaction is holding a shared lock and preventing exclusive access.

SQLite does not automatically retry.

The transaction remains active, with all its cached changes intact. It is the application’s responsibility to retry the commit later, once other readers have cleared out.

This design keeps SQLite honest: it never blocks indefinitely, and it never hides lock contention from the application.

My experiments and hands-on executions related to SQLite will live here: lovestaco/sqlite