SQLite Is All You Need for Durable Workflows

#ai #database #sql #systemdesign

SQLite Is All You Need for Durable Workflows

When the DBOS team argued that Postgres is all you need for durable execution, they made a solid case: if you already trust your database, you don’t need a separate orchestration tier. But the idea can be pushed further. For a large class of durable systems — especially those running AI agents — SQLite is all you need.

The Durable Part Is the State, Not the Infrastructure

Durable execution is often discussed as if it requires durable infrastructure. In many cases it doesn’t. The durable part is the workflow state. The compute can stay cheap and disposable.

This is a natural fit for workflow systems like Obelisk: workflow progress lives in an execution log, workflows replay from persisted history, and activities can be retried. What matters most is keeping the workflow state around and easy to inspect.

Why SQLite Fits

SQLite is appealing because it gives you transactional durable state without introducing a separate database service:

No network hop to a remote database
No extra control plane
No new operational surface area just to keep workflow progress safe

For many systems, a local database file is exactly the right level of machinery. You write to disk, you get durability, and the file is yours to inspect, back up, or copy around.

Litestream Makes It Portable

The obvious concern: what do you do with those SQLite files as experiments accumulate? That’s where Litestream comes in. It streams SQLite changes asynchronously to S3-compatible object storage. That gives you a simple way to keep working state close to the runtime while still copying databases out for backup, migration, and inspection.

# litestream.yml
dbs:
  - path: /var/lib/obelisk/workflow.db
    replicas:
      - url: s3://my-bucket/workflows/

The caveat: Litestream replication is asynchronous. A restore can miss the newest local writes if the SQLite volume disappears before they are copied. For many AI and experimentation workflows, that’s fine — but it’s not the same as a highly available shared database.

Why This Works Well for AI Agents

This is especially attractive for AI agent and AI-generated workflows. Those systems are often:

Bursty: they run heavily for a while, then sit idle
Experimental: you want to iterate fast without infrastructure lock-in
Easier to reason about when each agent or tenant has a small, self-contained unit of state

A fleet of tiny servers in micro-VMs or containers, each with its own SQLite database and object storage backup, is often a better fit than one large always-on shared system. It’s simpler, cheaper, and gives better fault isolation.

The operating model: run a workflow server with a SQLite database, back it up with Litestream, and let an observer pull interesting databases when needed. The same file can be used for local replay, debugging, and understanding what an agent actually did.

When to Use Postgres Instead

SQLite is not the answer to every deployment shape. Postgres is the right choice when you need:

Higher availability — shared, replicated database across multiple workers
Broader shared scalability — concurrent writes from many processes
Synchronous replication — real-time durability guarantees

Many workflow systems don’t need that on day one and shouldn’t start with more infrastructure than their state actually demands.

The Bottom Line

A local SQLite database plus Litestream backup to S3 is enough for a surprisingly large class of durable systems. Add cheap workers around it and you get a durable execution system with very little infrastructure. For the world of AI agents, that may be the most sensible default starting point.

What workflows are you running that might benefit from this approach? Would a local-first, file-based durable execution model work for your use case?