DEV Community

Cover image for Deep Dive into SQLite’s Internal Architecture
Athreya aka Maneshwar
Athreya aka Maneshwar

Posted on • Edited on

Deep Dive into SQLite’s Internal Architecture

Hello, I'm Maneshwar. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star Us to help devs discover the project. Do give it a try and share your feedback for improving the product.

SQLite is one of the most widely used embedded databases, powering everything from browsers to mobile apps.

Your program links SQLite and calls functions; SQLite translates SQL into bytecode and executes it inside a virtual machine that manipulates B-trees stored in a single database file.

The main logical components are:

  1. SQL layer — parser, analyzer, and code generator (turns SQL into VDBE bytecode).
  2. VDBE (Virtual DataBase Engine) — the virtual machine that executes the bytecode.
  3. B-Tree layer — data storage abstraction: tables and indexes are B-trees stored in the file.
  4. Pager — manages pages in memory, journal files, atomic commit/rollback, and crash-safety.
  5. VFS (OS interface) — file-system abstraction layer that performs the low-level I/O and locking.
  6. Cache and page cache — memory pages for the DB file; interacts with pager and B-tree.

From SQL to work: parser → bytecode → VM

  • When you execute SQL, SQLite first tokenizes and parses it, runs analysis/semantic checks, then emits portable bytecode (the VDBE program). The VDBE has opcodes for scanning B-trees, reading/writing columns, doing joins, sorting, aggregations, etc.
  • The VM is a stack/register machine implemented in vdbe.c and friends. Execution of this bytecode is how SELECT/INSERT/UPDATE/DELETE actually access the B-trees and pages.

Why this matters: Because SQLite is an embedded library, compiling SQL to a compact bytecode and using an in-process VM avoids IPC/serialization overhead and improves latency for small queries.

On-disk layout: pages and B-trees

  • Single file: an SQLite database (schema, tables, indices, data) is stored in one cross-platform file.
    The file is split into fixed-size pages (default 4096 bytes but configurable). Each table/index is stored in a separate B-tree structure mapped onto those pages.
    Interior pages store keys/ranges, leaf pages hold row payloads (SQLite uses B+tree-like organization).
    The file format (header, page types, cell encodings, variable-length integers) is precisely defined to be stable across versions.

  • Variable-length records and compact storage are used to pack rows efficiently; overflow pages are used when payloads exceed one page.

Pager and crash-safety (journaling)

The pager is the subsystem responsible for reading/writing pages and ensuring atomic commit/rollback across crashes. Two primary journaling models:

  1. Rollback journal (classic journaling)
  • When a transaction begins, the pager writes the original contents of pages that will be modified into a rollback journal file. If a crash happens before commit, the journal is replayed to restore the original DB. On successful commit, the journal is deleted (or overwritten). This provides atomicity and durability.
  1. Write-Ahead Log (WAL) mode (recommended for many concurrent workloads)
  • Instead of writing old page images, changes are appended to a WAL file. Readers read the original DB file plus WAL frames to see the latest committed state; writers append to WAL and commit by marking a frame header. WAL allows concurrent readers and a writer (readers never block writers, and writers don’t block readers) which is a big concurrency improvement over the rollback journal. Periodically, a checkpoint copies WAL frames back into the main DB file and truncates the WAL. The WAL file format, index, and locking rules are explicitly specified.

Key practical differences:

  • Rollback journal: simpler, writer blocks readers during the write/commit phases.
  • WAL: better read/write concurrency, requires shared memory (on many OSes) for coordination and produces .wal and .shm files alongside the DB. WAL files grow until checkpoints run (auto or manual).

Locking & concurrency

  • SQLite coordinates multi-process/thread access via file locks implemented by the VFS layer.
    Lock states like SHARED, PENDING, EXCLUSIVE govern who can read or write. In classic journal mode, a writer will obtain locks that prevent other writers and sometimes readers; in WAL mode, locks are different to allow concurrent readers during a write.
    The docs describe the precise lock transitions and semantics. If a requested lock is unavailable, SQLite returns SQLITE_BUSY (or waits up to a timeout).

  • Shared-cache mode: an optional mode where multiple connections in the same process can share page cache; this changes locking semantics and is rarely needed for typical apps.

Page cache, memory management, and performance

  • Pages read from disk are cached in the page cache. The pager controls what stays in memory and how dirty pages are flushed. PRAGMA settings (e.g., cache_size, synchronous, journal_mode, locking_mode) affect durability vs. speed tradeoffs.
  • SQLite is designed to be efficient for small to medium databases and many embedded workloads. For heavy concurrent writes or huge datasets, a client/server RDBMS may scale better. (See pragmas & performance sections).

Transactions, atomicity, and isolation

  • Transactions are ACID:

    • Atomic: via rollback journal or WAL+checkpoint.
    • Consistent: via SQL constraints and transactional semantics.
    • Isolated: default isolation level is serializable for separate connections (though read_uncommitted pragma and shared-cache can let a connection see uncommitted data). WAL mode lets readers see a consistent snapshot while a writer appends WAL frames.

Important nuance: because SQLite is file-based, multiple processes must coordinate via the filesystem; thus concurrency semantics depend on OS-level locking and the chosen journal mode.

Virtual tables, extensions, and VFS

  • Virtual tables (e.g., FTS, custom modules) let you expose non-standard backends (search indices, CSV files, remote data) via the SQL table abstraction.
  • VFS layer is pluggable; you can write a custom VFS to change how SQLite talks to the OS (useful to embed SQLite on exotic platforms, store DB on custom block devices, or implement custom locking semantics).

Putting It All Together

When you run a SQL command:

  1. SQLite parses and compiles it into bytecode.
  2. The VDBE executes the bytecode step-by-step.
  3. It interacts with B-tree structures to locate or modify data.
  4. The Pager ensures that changes are cached and journaled correctly.

Finally, the VFS writes datasafely to the actual .sqlite file on disk.

git-lrc
*AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.*

Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.

⭐ Star it on GitHub:

GitHub logo HexmosTech / git-lrc

Free, Unlimited AI Code Reviews That Run on Commit

git-lrc logo

git-lrc

Free, Unlimited AI Code Reviews That Run on Commit



git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt



AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

See It In Action

See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements

git-lrc-intro-60s.mp4

Why

  • 🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
  • 🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
  • 🔁 Build a habit, ship better code. Regular review → fewer bugs → more robust code → better results in your team.
  • 🔗 Why git? Git is universal. Every editor, every IDE, every AI…

Top comments (0)