HelperX

Posted on Jun 18

Multi-Tenant Data Isolation in SQLite: Per-User Database Files vs Row-Level

#database #sqlite #architecture #saas

When you build a multi-tenant SaaS, the first big architectural decision is how to isolate tenant data. With Postgres, the question is usually "shared schema with tenant_id, schema-per-tenant, or database-per-tenant." With SQLite, the trade-offs shift because the operational characteristics are completely different.

For HelperX, we went all the way to one SQLite database file per tenant instead of using row-level isolation. A year later it's the design decision I'd most strongly defend. This article is the case for that choice — when it works, when it breaks, and the specific patterns that make it production-ready.

The two ends of the spectrum

Multi-tenant isolation in SQLite breaks down to two real options:

Option A: Shared database with row-level isolation

A single app.db file with a tenant_id column on every table. Every query filters by tenant_id = ?. Standard pattern in most SaaS frameworks.

Option B: Database file per tenant

One SQLite file per tenant: data/tenants/<tenant_id>.db. Each file is structurally identical but completely isolated. Queries don't need WHERE tenant_id because there's nothing else in the database.

(There's a third option — schema-per-tenant in a shared database — but SQLite doesn't have schemas in the Postgres sense, so it's not really available.)

The naive view is that Option A is "easier" and Option B is "weird." After running B in production for a year, my view is the opposite: B is dramatically simpler operationally, and A introduces a whole class of bugs that B makes structurally impossible.

Why row-level isolation is dangerous

The fundamental problem with row-level isolation is that the isolation is enforced at every query, by application code. Every SELECT, every UPDATE, every DELETE has to include WHERE tenant_id = ?. Miss it once, and you have cross-tenant data leakage.

In real codebases, you miss it. Eventually. Common ways:

// 1. The "I'll add the filter later" bug
const allConfig = db.prepare('SELECT * FROM config WHERE key = ?').all(key);
// Forgot: WHERE tenant_id = ? AND key = ?

// 2. The JOIN that lost the filter
const result = db.prepare(`
  SELECT a.*, b.name FROM actions a
  JOIN modules b ON a.module_id = b.id
  WHERE a.tenant_id = ?
`).all(tenantId);
// `modules` table also has tenant_id but isn't filtered

// 3. The UPDATE without WHERE
db.prepare('UPDATE config SET value = ? WHERE key = ?').run(newValue, key);
// Updated every tenant's config

// 4. The aggregate query
const counts = db.prepare('SELECT COUNT(*) FROM audit_log').all();
// Counted across all tenants

You can defend against these with ORM helpers, row-level security policies, code review, and tests. None of those defenses are 100%. The probability of a leakage bug over a long time horizon is high.

Worse, the leakage might not be caught immediately. A bug that aggregates data across tenants in a background job might run silently for weeks before anyone notices. By then, the damage is done — both reputationally and possibly legally.

Why per-database isolation is structurally safe

With Option B, the cross-tenant leakage failure mode doesn't exist:

function getTenantDb(tenantId) {
  return new Database(`data/tenants/${tenantId}.db`);
}

const db = getTenantDb(tenantId);
const counts = db.prepare('SELECT COUNT(*) FROM audit_log').all();
// Counted within this tenant only — there is no other data

The query can't accidentally include other tenants' data because the database file contains only this tenant's data. There's no WHERE tenant_id to forget, no JOIN that could lose the filter, no aggregate that could spill.

The application-level isolation is replaced by filesystem-level isolation. The OS does the work for you.

The trade-offs

It's not free. Per-database isolation has real costs:

Concern	Shared DB	Per-tenant DB
Cross-tenant queries	Easy (single query)	Hard (loop over DBs)
Schema migrations	One migration run	N migrations
Connection management	One pool	N connections (or lazy open)
Disk usage	Slightly lower (deduplicated indexes)	Slightly higher
Operational simplicity	Worse (one big DB)	Better (small isolated files)
Backup granularity	Whole DB	Per tenant
Tenant deletion	DELETE queries	`rm tenant.db`
Cross-tenant analytics	One query	Aggregate across files

The two columns that often look like "wins" for shared DB are the only ones we struggled with: cross-tenant queries and migrations.

Cross-tenant queries

For HelperX, the cross-tenant queries we need are things like:

Total active slots across the platform
Aggregate engagement metrics for the homepage
Billing rollups (actions per tenant per month)
System health (slow queries, error rates) across tenants

These all use either:

The global database (which we keep for cross-tenant data) — billing, users, plans
Aggregated stats periodically written to global — each tenant's worker writes its daily summary to global.db

The pattern: don't aggregate across many tenant DBs in real-time. Instead, each tenant's worker periodically (every hour, or after big operations) writes a summary row to the global DB. Queries that need cross-tenant data read from the summary, not from the source tables.

// Tenant worker: periodically write summaries
async function writeSummary(tenantId) {
  const tenantDb = getTenantDb(tenantId);

  const stats = tenantDb.prepare(`
    SELECT 
      COUNT(*) as total_actions,
      COUNT(CASE WHEN status = 'success' THEN 1 END) as success_count,
      AVG(duration_ms) as avg_duration
    FROM audit_log 
    WHERE timestamp > datetime('now', '-1 hour')
  `).get();

  globalDb.prepare(`
    INSERT OR REPLACE INTO tenant_hourly_stats 
    (tenant_id, hour, total_actions, success_count, avg_duration)
    VALUES (?, ?, ?, ?, ?)
  `).run(tenantId, currentHour(), stats.total_actions, stats.success_count, stats.avg_duration);
}

The global DB then answers cross-tenant queries:

// Cross-tenant query against summary table
const totals = globalDb.prepare(`
  SELECT SUM(total_actions) as platform_total, COUNT(DISTINCT tenant_id) as active_tenants
  FROM tenant_hourly_stats 
  WHERE hour > strftime('%s', 'now', '-24 hours')
`).get();

This adds latency to summaries (they're stale by up to an hour) but makes cross-tenant analytics trivially fast.

Migrations

Running migrations against N database files sounds nightmarish. In practice, it's straightforward — but you have to design for it.

We use a migration system where each tenant database tracks its own version:

const MIGRATIONS = [
  { version: 1, up: `CREATE TABLE config (key TEXT PRIMARY KEY, value TEXT);` },
  { version: 2, up: `ALTER TABLE audit_log ADD COLUMN metadata TEXT;` },
  { version: 3, up: `CREATE INDEX idx_audit_module ON audit_log(module, status);` },
];

function migrateDb(db) {
  db.exec(`
    CREATE TABLE IF NOT EXISTS _migrations (
      version INTEGER PRIMARY KEY,
      applied_at TEXT DEFAULT (datetime('now'))
    );
  `);

  const current = db.prepare('SELECT MAX(version) as v FROM _migrations').get().v ?? 0;

  for (const mig of MIGRATIONS) {
    if (mig.version > current) {
      try {
        db.exec(mig.up);
        db.prepare('INSERT INTO _migrations (version) VALUES (?)').run(mig.version);
      } catch (err) {
        throw new Error(`Migration ${mig.version} failed: ${err.message}`);
      }
    }
  }
}

This runs lazily on every database open. The first time a tenant DB is opened after deployment, it migrates. Each migration takes 10-50ms; the latency cost is invisible.

For tenants that aren't accessed in months, the migration runs whenever they're next opened. That's the right behavior — pay the cost when the database is needed, not in a maintenance window.

What about destructive migrations?

The trickiest case is migrations that need data transformation:

{ 
  version: 4, 
  up: `
    BEGIN TRANSACTION;
    CREATE TABLE config_new (key TEXT PRIMARY KEY, value TEXT, type TEXT NOT NULL DEFAULT 'string');
    INSERT INTO config_new (key, value) SELECT key, value FROM config;
    DROP TABLE config;
    ALTER TABLE config_new RENAME TO config;
    COMMIT;
  `
}

Each tenant DB handles this transaction independently. If a tenant DB has corrupted data that breaks the migration, only that tenant's migration fails. The rest continue working. Compare to a shared DB: one corrupted row blocks everyone.

The connection management question

200 active SQLite files. How do you handle connections?

Naive approach: open on demand, close after each query.

function withTenantDb(tenantId, fn) {
  const db = new Database(`data/tenants/${tenantId}.db`);
  try {
    return fn(db);
  } finally {
    db.close();
  }
}

Works, but opens/closes a file handle every query. For workers running thousands of queries per hour per tenant, this adds up.

Better: cache open connections.

const connections = new Map();
const lastAccess = new Map();

function getTenantDb(tenantId) {
  const cached = connections.get(tenantId);
  if (cached) {
    lastAccess.set(tenantId, Date.now());
    return cached;
  }

  const db = new Database(`data/tenants/${tenantId}.db`);
  db.pragma('journal_mode = WAL');
  db.pragma('synchronous = NORMAL');
  db.pragma('busy_timeout = 5000');

  connections.set(tenantId, db);
  lastAccess.set(tenantId, Date.now());
  return db;
}

// Periodic cleanup of idle connections
setInterval(() => {
  const cutoff = Date.now() - 10 * 60 * 1000; // 10 min idle
  for (const [tenantId, ts] of lastAccess) {
    if (ts < cutoff) {
      const db = connections.get(tenantId);
      if (db) {
        db.close();
        connections.delete(tenantId);
        lastAccess.delete(tenantId);
      }
    }
  }
}, 60 * 1000);

Keeps frequently-accessed databases open, closes the ones that are idle. Memory footprint stays bounded.

For 200 active slots, we typically have 30-50 connections open at any moment. The rest close after idle timeout. Memory cost is roughly 2-5 MB per open connection.

Tenant lifecycle

The clarity of per-tenant DBs really shows in tenant operations.

Creating a tenant

function createTenant(tenantId) {
  const dbPath = `data/tenants/${tenantId}.db`;
  if (fs.existsSync(dbPath)) {
    throw new Error('Tenant already exists');
  }
  const db = new Database(dbPath);
  db.pragma('journal_mode = WAL');
  migrateDb(db); // applies all migrations at current version
  db.close();
}

One file create, schema initialization in milliseconds. No "CREATE TENANT" rituals, no row insertion across many tables.

Deleting a tenant

function deleteTenant(tenantId) {
  closeTenantDb(tenantId); // close any cached connection
  const dbPath = `data/tenants/${tenantId}.db`;
  const walPath = `${dbPath}-wal`;
  const shmPath = `${dbPath}-shm`;

  fs.unlinkSync(dbPath);
  if (fs.existsSync(walPath)) fs.unlinkSync(walPath);
  if (fs.existsSync(shmPath)) fs.unlinkSync(shmPath);
}

Three file deletes. Compare to row-level deletion across 12 tables with foreign key constraints. The shared-DB version is far more bug-prone (forget a table, orphaned rows).

GDPR data deletion requests in particular are dramatically simpler with per-tenant files — rm and you're done.

Exporting tenant data

GDPR "right to data portability" requests are trivial: zip the tenant's DB file and send it. Customer wants a backup of their data: cp tenant.db backup.db. Customer wants to move to a different server: copy the file.

With shared DBs, exporting one tenant's data requires N SELECT queries across N tables, joined back into a consistent structure. We've seen teams spend weeks building "tenant data export" features. With per-tenant files, the feature is shell scripting.

When per-tenant DBs break

The pattern doesn't work for every workload. Specifically:

1. High write concurrency within a tenant.

SQLite serializes writes within a database. If multiple processes write to the same tenant DB simultaneously, they queue. For workloads where one tenant might have 50 concurrent writers, you'll bottleneck.

For HelperX, each tenant has exactly one worker writing to it. Concurrent writes within a tenant don't happen. If your workload has multiple workers per tenant, this matters.

2. Cross-tenant transactions.

If you need atomic operations spanning multiple tenants (rare in most SaaS), you can't do it. Each DB is its own transaction boundary. You'd need a transaction coordinator, which defeats the point of using SQLite.

3. Very high tenant counts.

SQLite handles many files fine, but at 50,000+ tenants you'll start hitting filesystem limits (inode counts, directory size). Sharding into subdirectories helps but adds complexity. Below 5,000 tenants you're nowhere near this.

4. Tenants that share data.

If "tenant X needs to read tenant Y's data" is a real product requirement, per-tenant DBs are awkward. The shared DB is built for this.

For HelperX, none of these break the design. Each tenant is fully isolated by product design (each tenant = one X account being automated), there's never a reason for one tenant's worker to read another's data, and we're orders of magnitude below the file count limit.

Performance comparison

We migrated a small test workload from row-level isolation to per-tenant DBs early in development. The numbers from that test:

Query	Shared DB (200 tenants)	Per-tenant DB	Difference
Read latest 100 audit rows (one tenant)	1.8 ms	0.3 ms	6x faster
Insert single audit row	0.4 ms	0.04 ms	10x faster
Update config value	0.6 ms	0.05 ms	12x faster
Cross-tenant count (rare)	2.1 ms	45 ms	21x slower

The per-tenant version is dramatically faster for tenant-scoped queries (our 99% case) and slower for cross-tenant queries (our 1% case). The cross-tenant slowness is mitigated by the summary-table pattern.

The reason per-tenant is so much faster for scoped queries: the database is tiny. Index scans are cheaper because there's less to scan. The query planner has fewer options to consider. Pages are more likely to be hot in OS cache.

What about Postgres row-level security?

Postgres has built-in row-level security (RLS) policies that enforce tenant isolation at the database layer:

ALTER TABLE audit_log ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON audit_log
  USING (tenant_id = current_setting('app.tenant_id')::int);

It's a real defense against application-layer mistakes. But it has its own complexity: you have to remember to SET app.tenant_id = ? at the start of every request, RLS policies are tricky to debug, and they add query planner overhead.

The argument for per-tenant SQLite over Postgres RLS: simplicity. The file system gives you isolation for free. No policies to maintain, no SET statements to remember, no policy bugs to debug.

For workloads that fit per-tenant SQLite's constraints, it's the dramatically simpler solution. For workloads that don't, Postgres + RLS is the right tool.

Key takeaways

Per-tenant SQLite files give you filesystem-level isolation — cross-tenant leakage becomes structurally impossible.
Row-level isolation depends on every query getting it right. Probability of a bug over a long horizon is high.
Cross-tenant queries use a global summary table, populated periodically by tenant workers. Same tradeoff as eventual consistency in distributed systems.
Migrations run lazily on database open. Each tenant migrates independently.
Connection caching with idle timeout keeps memory bounded while staying fast.
Tenant create/delete/export are file operations, not bulk SQL statements.
Per-tenant is dramatically faster for tenant-scoped queries — typically 5-10x.
It breaks for high write concurrency per tenant, cross-tenant transactions, or very high tenant counts (50K+).

The simplicity is the whole point. The fewer places isolation can fail, the more confident you can be it never will.

HelperX runs one SQLite per tenant slot — full isolation, zero shared mutable state. Self-hosted, free 30-day trial.

DEV Community