DEV Community

pickuma
pickuma

Posted on • Originally published at pickuma.com

LiteFS and Distributed SQLite: How Cross-Region Replication Actually Works

SQLite is a single file on a single disk. That is its whole appeal and, for anyone running a multi-region app, its whole problem. If your database is a file on one machine in Virginia, your users in Singapore pay a round trip across the planet on every read. LiteFS exists to break that constraint without making you give up the embedded, zero-network-hop model that made SQLite worth using in the first place.

LiteFS is a FUSE filesystem from Fly.io that sits underneath your application's SQLite file and replicates it across nodes. Your app still calls sqlite3_open() on a normal-looking path. Underneath, LiteFS is intercepting the writes and shipping them to every replica. This piece walks through how that actually works, where it bites you, and how it compares to the Raft-based alternatives.

How LiteFS replicates a file you think is local

The trick is the FUSE layer. LiteFS mounts a directory, and SQLite reads and writes its database through that mount. Because LiteFS is the filesystem, it sees every page-level change SQLite makes inside a transaction. When a transaction commits, LiteFS packages the changed pages into an LTX file — a Lite Transaction file — and assigns it a monotonically increasing transaction ID (TXID).

Replicas don't replay SQL. They apply LTX files. Each replica tracks its current position as a TXID plus a checksum of the database state, and it pulls the next LTX file from the primary in order. Because the unit of replication is a set of physical pages rather than a statement, replicas stay byte-for-byte identical to the primary — there is no "non-deterministic RANDOM() drifted my replica" class of bug that logical replication can introduce.

The topology is deliberately simple: one primary, many read replicas. The primary is chosen by a lease. You can run a distributed lease backed by Consul, which lets the primary move automatically if a node dies, or a static lease where you pin the primary to a known node and accept manual failover. Consul leasing is what makes LiteFS tolerate the primary disappearing; static leasing is simpler to reason about but means a dead primary is a dead writer until you intervene.

LiteFS replicates at the page level, not the statement level. That means every replica is an exact physical copy of the primary's database file, and it also means you cannot have a replica with a different schema, a different index, or a partial subset of the data. It is full-database replication or nothing.

The single-writer tax

Here is the part that surprises people coming from Postgres or MySQL: only the primary can write. Every replica is strictly read-only. If a request lands on a node in Frankfurt and tries to INSERT, the local SQLite file rejects it, because that node does not hold the write lease.

LiteFS handles this with write forwarding. On Fly.io, the LiteFS proxy inspects the request, and if it is a write that arrived on a replica, it returns a fly-replay header that tells the platform to re-run the request in the primary's region. Your app code can stay oblivious — the write just happens to execute a few dozen milliseconds later, in another region. Off Fly.io, you wire equivalent routing yourself, typically by checking whether the local node is primary (.primary file) and proxying writes to whoever is.

The second half of the tax is replication lag. Replication is asynchronous. The primary acknowledges a commit as soon as it is durable locally, then ships the LTX file. A replica might be tens of milliseconds behind, or seconds behind if it was partitioned. This produces the classic read-your-own-writes hazard: a user submits a form (write forwarded to the primary), gets redirected, and the redirect reads from a local replica that hasn't caught up yet — so their change appears to have vanished.

LiteFS gives you a tool for this: the replication position is exposed as a TXID, and you can require a read to wait until the local replica has reached at least a given TXID before serving it. You set this up per-request where it matters (the redirect after a write) rather than globally, because forcing every read to wait for the primary would throw away the latency win that justified LiteFS in the first place.

Async replication means a primary that crashes after acknowledging a commit but before shipping the LTX file takes those transactions with it. If your workload cannot tolerate losing the last few unreplicated writes on a hard primary failure, you need synchronous-style guarantees that LiteFS does not provide. Reach for a Raft-based store instead.

LiteFS versus the Raft crowd

LiteFS is not the only way to make SQLite distributed, and the alternatives make a fundamentally different bet on consistency.

Project Replication model Consistency Writes Best fit
LiteFS Async, page-level LTX over FUSE Eventual on replicas, read-your-writes via TXID waits Single primary, write-forwarded Read-heavy apps that want global low-latency reads
rqlite Raft consensus, SQL statements Strong (linearizable reads available) Leader only, via HTTP API Apps that need a consensus-backed config/state store
dqlite Raft, C library Strong Leader only Embedded into Go/C systems (Canonical uses it in LXD)
libSQL / Turso Server-side replication, embedded replicas Tunable Primary with embedded read replicas Edge apps wanting a managed SQLite-compatible service

The split is clean. rqlite and dqlite use Raft, so a write isn't acknowledged until a quorum of nodes has it — you get strong consistency and survive a node loss without losing committed data, but every write pays a consensus round trip and you talk to the database over an API instead of as a local file. LiteFS keeps the local-file, embedded experience and the near-zero read latency, and pays for it with async replication and a single writer. libSQL/Turso sits closer to LiteFS philosophically (embedded replicas, fast local reads) but is delivered as a managed service with its own protocol.

If the question is "I want my reads fast everywhere and I do far more reads than writes," LiteFS is the natural pick. If the question is "I cannot lose a committed write," you want Raft.

When distributed SQLite is the right call

LiteFS earns its place when your traffic is read-dominated, your writes can tolerate landing in one region, and you value keeping the database as an embedded file rather than a network service. Content sites, dashboards, read-heavy SaaS, and anything you would otherwise serve from a single big Postgres with read replicas are all good fits. The operational story is genuinely lighter than running a Postgres cluster: there is no connection pooler, no separate database tier, and the database travels with your app process.

It is the wrong call when writes are frequent and globally distributed, when you need multiple writers, or when losing a handful of unreplicated transactions on failover is unacceptable. Those are consensus-database problems, and bolting strong consistency onto an async page-shipping filesystem is fighting the design.

Start with a static lease and a single known primary while you learn the failure modes, then move to Consul leasing once you actually need automatic failover. Debugging an unexpected primary handoff is much harder than debugging a writer you pinned yourself.


Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.

Top comments (0)