SEN LLC

Posted on Apr 15

SQLite's PRAGMAs You Never Remember — I Put Them in a CLI

#rust #sqlite #cli #tutorial

SQLite's PRAGMAs You Never Remember — I Put Them in a CLI

When you're handed an unknown SQLite file — a SaaS export, a customer backup, a bundled fixture — the first thing you want is "what's in here?". SQLite has PRAGMA commands for every bit of that answer. Nobody remembers them. So I wrote sqlite-stats, a tiny Rust CLI that walks all of them in one command.

📦 GitHub: https://github.com/sen-ltd/sqlite-stats

The binary is about 10 MB of static Alpine musl, three dependencies (clap, rusqlite with bundled, serde_json), and the core inspector is one file of pure PRAGMA calls. This post is half a walkthrough of the tool and half a tour of the parts of SQLite's PRAGMA surface that I always forget.

The problem: inherited databases

At least once a quarter I end up staring at a .db file I didn't create. A user uploads an export of their old app. A SaaS tool drops a backup. A dependency ships test fixtures in SQLite form. A colleague says "hey, remember that script? here's the output".

The honest workflow:

Open sqlite3 mystery.db.
.tables — okay, there's stuff.
.schema users — okay, there's a users table.
"How big is this database actually?" → dig through docs.
"Which table has the most rows?" → write a query per table.
"Is there an index on user_id?" → .schema posts and squint.
"Why is this query slow?" → EXPLAIN QUERY PLAN SELECT ....

Every single one of those is a PRAGMA or a sqlite_master query or a virtual-table query. SQLite exposes an absurdly rich metadata surface; it's just that nobody has a mnemonic for PRAGMA table_info(x) vs PRAGMA index_list(x) vs the dbstat virtual table vs foreign_key_list. sqlite-stats runs all of them and prints a report.

$ sqlite-stats mystery.db
=== Database ===
  path         : mystery.db
  file size    : 128.0 KB (32 pages × 4096 bytes)
  encoding     : UTF-8
  journal mode : delete
  user_version : 0   application_id: 0   foreign_keys: on

=== Tables (5) ===
  name               rows     cols      idx       approx
  comments            500        4        1      10.8 KB
  posts               200        5        2      49.9 KB
  users                50        4        1       1.4 KB
  ...

Same binary, no more squinting.

Design: three dependencies and a bundled SQLite

The repository is deliberately simple:

src/
├── main.rs          entry + exit-code plumbing
├── cli.rs           clap definitions
├── inspector.rs     PRAGMA walks, pure given a Connection
├── explain.rs       EXPLAIN QUERY PLAN → tree
└── formatters.rs    human / json / markdown

Three dependencies in Cargo.toml:

clap       = { version = "4.5", features = ["derive"] }
rusqlite   = { version = "0.32", features = ["bundled"] }
serde_json = "1"

Two points worth dwelling on.

rusqlite with bundled. The bundled feature compiles SQLite's amalgamation directly into the binary. This costs you about 8 MB of build weight but buys you a few things: (1) the binary runs anywhere musl runs with no external libsqlite3 dependency, (2) the bundled build comes with SQLITE_ENABLE_DBSTAT_VTAB turned on, which means sqlite-stats can report per-table payload byte counts from the dbstat virtual table. System SQLite usually does not have dbstat enabled. More on that below.

No sqlx, no tokio, no async. This is a short-lived CLI that opens a database, reads maybe a dozen pragmas, and exits. Async would be pure ceremony.

The release profile is the standard Rust-CLI-for-Alpine recipe:

[profile.release]
strip = true
lto = true
codegen-units = 1
opt-level = "z"
panic = "abort"

Final Alpine image: 10.6 MB.

The inspector is just PRAGMA calls

Here's the core loop that produces database-level info. Pure rusqlite, no clever abstractions:

fn pragma_string(conn: &Connection, name: &str) -> Result<String> {
    let sql = format!("PRAGMA {name}");
    let mut stmt = conn.prepare(&sql)?;
    let mut rows = stmt.query([])?;
    if let Some(row) = rows.next()? {
        let v: rusqlite::types::Value = row.get(0)?;
        Ok(value_to_string(v))
    } else {
        Ok(String::new())
    }
}

pub fn read_database_info(conn: &Connection) -> Result<DatabaseInfo> {
    let page_size      = pragma_i64(conn, "page_size")?;
    let page_count     = pragma_i64(conn, "page_count")?;
    let encoding       = pragma_string(conn, "encoding")?;
    let journal_mode   = pragma_string(conn, "journal_mode")?;
    let user_version   = pragma_i64(conn, "user_version")?;
    let application_id = pragma_i64(conn, "application_id")?;
    let foreign_keys_on = pragma_i64(conn, "foreign_keys")? != 0;

    // PRAGMA database_list gives us the resolved on-disk path.
    let mut path = String::from(":memory:");
    let mut stmt = conn.prepare("PRAGMA database_list")?;
    let mut rows = stmt.query([])?;
    while let Some(row) = rows.next()? {
        let name: String = row.get(1)?;
        if name == "main" {
            if let Some(p) = row.get::<_, Option<String>>(2)? {
                if !p.is_empty() { path = p; }
            }
            break;
        }
    }

    Ok(DatabaseInfo {
        path, page_size, page_count,
        file_size_bytes: page_size * page_count,
        encoding, journal_mode,
        user_version, application_id, foreign_keys_on,
    })
}

The things I always forget and have to look up every time:

PRAGMA page_size × PRAGMA page_count = actual file size, without stat()ing the file. Handy inside networked or read-only filesystems where you don't have std::fs::metadata.
PRAGMA database_list returns a row per attached database (there's always at least main); column 2 is the on-disk path, which is an easy way to get the real file location from just a Connection.
PRAGMA user_version and PRAGMA application_id are two 32-bit slots SQLite reserves for apps to stamp their own schema version / file-magic. Lots of apps use them; most dumps I inspect leave them at 0.
PRAGMA foreign_keys has an annoying property: it's per-connection and defaults to off. The fact that the report says foreign_keys: on reflects whether this inspector connection has enforcement enabled, not whether the database was written with enforcement. That's a footgun I hit while writing the tests.

Walking the schema

User-visible schema items live in sqlite_master. Everything else is reachable from there:

let mut stmt = conn.prepare(
    "SELECT name, type FROM sqlite_master \
     WHERE type IN ('table','view') \
     ORDER BY type, name",
)?;

Given a table name, PRAGMA table_info(x) returns one row per column: cid, name, type, notnull, dflt_value, pk. PRAGMA index_list(x) returns indexes on that table, including the auto-created ones for UNIQUE constraints and primary keys. For each index, PRAGMA index_info(idx) returns the columns it covers. Foreign keys come from PRAGMA foreign_key_list(x).

The one quoting trap: PRAGMA arguments are identifiers, so you have to double-quote them to survive reserved words and special characters. I wrote a tiny helper:

pub fn quote_ident(s: &str) -> String {
    let escaped = s.replace('"', "\"\"");
    format!("\"{escaped}\"")
}

And use it for every dynamic PRAGMA call: PRAGMA table_info({quote_ident(name)}). Without this, a table literally named order breaks the tool. With this, embedded double quotes in table names (rare, but legal) also work.

dbstat: per-table byte counts

The "Top 5 largest tables" section is the thing that actually made me build this. PRAGMA page_count tells you the whole file is 128 KB. It does not tell you which table is eating most of it. For that you need the dbstat virtual table:

SELECT name, SUM(payload) FROM dbstat GROUP BY name;

This works as long as SQLite was built with SQLITE_ENABLE_DBSTAT_VTAB. The bundled build that rusqlite ships with has this on, which was the killer argument for the bundled feature over depending on system SQLite. On macOS system SQLite, dbstat is not available by default; on many Linux distros it isn't either. If you want a tool that Just Works on anyone's machine, you bundle it.

The inspector probes for dbstat once and falls back gracefully:

let dbstat_ok = conn
    .prepare("SELECT name, SUM(payload) FROM dbstat GROUP BY name")
    .is_ok();

if dbstat_ok {
    // fill a BTreeMap<String, i64>
}

If dbstat is missing (e.g. someone built a custom sqlite without the vtab), the "Top 5 largest" fallback is row_count * 64 bytes — a crude estimate, but better than nothing, and clearly labeled (est) in the human output.

EXPLAIN QUERY PLAN as a tree

SQLite returns EXPLAIN QUERY PLAN as a flat result set: (id, parent, notused, detail). parent references the id of a parent step (0 for roots). That's a standard parent-pointer tree, so I turned it into real PlanNode values and rendered them with box-drawing glyphs.

#[derive(Debug, Clone)]
pub struct PlanNode {
    pub row: PlanRow,
    pub children: Vec<PlanNode>,
}

pub fn build_tree(rows: &[PlanRow]) -> Vec<PlanNode> {
    fn children_of(rows: &[PlanRow], parent: i64) -> Vec<PlanNode> {
        rows.iter()
            .filter(|r| r.parent == parent)
            .map(|r| PlanNode {
                row: r.clone(),
                children: children_of(rows, r.id),
            })
            .collect()
    }
    children_of(rows, 0)
}

pub fn render_tree(nodes: &[PlanNode]) -> String {
    fn walk(out: &mut String, nodes: &[PlanNode], prefix: &str) {
        for (i, node) in nodes.iter().enumerate() {
            let last = i + 1 == nodes.len();
            let glyph = if last { "└─ " } else { "├─ " };
            out.push_str(prefix);
            out.push_str(glyph);
            out.push_str(&node.row.detail);
            out.push('\n');
            let child_prefix = if last {
                format!("{prefix}   ")
            } else {
                format!("{prefix}│  ")
            };
            walk(out, &node.children, &child_prefix);
        }
    }
    let mut out = String::new();
    walk(&mut out, nodes, "");
    out
}

Quadratic in the number of steps (each recursive call rescans the flat list), but plans are tiny, so it doesn't matter. Running it against the demo DB:

$ sqlite-stats demo.db --explain-query 'SELECT * FROM posts WHERE user_id = 1'
└─ SEARCH posts USING INDEX idx_posts_user (user_id=?)

On multi-join queries with subqueries you get a real tree with indentation that actually looks like the execution plan. I find this much easier to read than the flat sqlite3 .expert output.

Three formatters, same data

The Report struct is the single source of truth. Three formatters consume it: human (the screenshot above), json (pretty-printed, one big envelope), and markdown (tables and headings you can paste into a PR comment). The markdown variant is the one I actually use most often — "here's what's in the database you sent me, for review" is a surprisingly common Slack message.

The JSON output is also what makes the tool composable with other scripts:

sqlite-stats big.db --format json | jq '.tables | map(select(.row_count > 10000))'

Testing a CLI that touches a database

All 21 integration tests create :memory: databases, populate them with a fixture schema, and assert on the output. No temp files, no cleanup, no flakiness. The fixture builds two tables with a foreign key and a named index, plus a unique constraint that triggers the auto-index path:

fn make_db() -> Connection {
    let conn = Connection::open_in_memory().unwrap();
    conn.execute_batch(r#"
        CREATE TABLE users(
            id INTEGER PRIMARY KEY,
            name TEXT NOT NULL,
            email TEXT UNIQUE
        );
        CREATE TABLE posts(
            id INTEGER PRIMARY KEY,
            user_id INTEGER REFERENCES users(id),
            title TEXT NOT NULL,
            body TEXT DEFAULT ''
        );
        CREATE INDEX idx_posts_user ON posts(user_id);
        INSERT INTO users(...) VALUES (...);
    "#).unwrap();
    conn
}

Tests assert that:

read_database_info returns a positive page size and UTF-8 encoding.
inspect sees both tables and skips sqlite_%.
Row counts are exact.
PRAGMA table_info flags NOT NULL and PRIMARY KEY correctly.
PRAGMA index_list finds both the named index and the auto-created unique index.
Foreign keys are resolved.
Each formatter emits its headline section.
The EXPLAIN tree preserves parent/child relationships and renders children with a continuation character (│) in the prefix.
quote_ident escapes embedded quotes.

Trade-offs

A few things sqlite-stats explicitly doesn't do:

Interactive mode. If you want to run queries, sqlite3 is already in $PATH. This tool stays non-interactive so it composes with pipes.
Writes. The CLI opens the database SQLITE_OPEN_READ_ONLY. It won't bump user_version, won't VACUUM, won't repair anything. Safer to reach for, easier to reason about.
Locking-sensitive PRAGMAs. A few PRAGMAs (notably wal_checkpoint) take write locks. The inspector avoids anything that modifies state.
Per-page analysis. dbstat can give you a lot more than payload sums — unused bytes, overflow page count, etc. I deliberately kept the report surface small.

Try it in 30 seconds

git clone https://github.com/sen-ltd/sqlite-stats
cd sqlite-stats
docker build -t sqlite-stats .
docker run --rm -v "$PWD":/work sqlite-stats /work/your.db

Or on a Mac without Docker:

cargo run --release -- your.db

If you're curious and don't have a SQLite file handy, any Firefox profile has one (places.sqlite), any VS Code install has one (state.vscdb), and iMessage on macOS has one (~/Library/Messages/chat.db). Pointing sqlite-stats at state.vscdb is a quick way to realize how much of VS Code's state actually lives in SQLite.

What I'd do next

The most obvious addition is a histogram mode — bucketing column values to characterize "what lives in column X". Second: a --diff mode that compares two databases and shows schema drift. Both are small; both are natural extensions of the Report struct. But the tool as it stands already answers 95% of "what's in this .db file?" inside a single command, which was the goal.

If PRAGMAs are the hidden API of SQLite, sqlite-stats is the cheat sheet you can actually run.

DEV Community

SQLite's PRAGMAs You Never Remember — I Put Them in a CLI

SQLite's PRAGMAs You Never Remember — I Put Them in a CLI

The problem: inherited databases

Design: three dependencies and a bundled SQLite

The inspector is just PRAGMA calls

Walking the schema

dbstat: per-table byte counts

EXPLAIN QUERY PLAN as a tree

Three formatters, same data

Testing a CLI that touches a database

Trade-offs

Try it in 30 seconds

What I'd do next

Top comments (0)