Andrew Huntingdon

Posted on Oct 14

Database Design for Multi-Tenant Restaurant Software: Lessons Learned

#database #saas

How we evolved from a simple shared schema to a hybrid approach that balances performance, isolation, and operational complexity

"We need to completely rebuild the database layer."

I was staring at our production database, watching a single poorly-optimized query from one restaurant slow down operations for hundreds of others. It was 2 AM, we'd just onboarded our largest customer yet, and their first day of real usage was bringing our entire platform to its knees.

The problem wasn't the query itself, it was our fundamental architectural decision to share database tables across all tenants with simple tenant_id filtering. This approach had worked fine when we had 20 customers. At 200 customers, with one processing thousands of transactions per hour, we were discovering its limitations the hard way.

This is the story of how we evolved BistroBee's database architecture, the painful lessons we learned about multi-tenancy in restaurant software, and the hybrid approach we eventually landed on that balances performance, data isolation, and operational complexity.

The Three Multi-Tenancy Approaches

For developers building SaaS applications, the multi-tenancy question is fundamental: how do you store data for multiple customers in a way that's secure, performant, and maintainable?

The three standard approaches each have clear tradeoffs:

Approach 1: Shared Database, Shared Schema

Every tenant's data lives in the same tables, distinguished by a tenant_id column:

CREATE TABLE restaurants (
  id UUID PRIMARY KEY,
  tenant_id UUID NOT NULL,
  name VARCHAR(255),
  address TEXT,
  -- ... other fields
);

CREATE TABLE shifts (
  id UUID PRIMARY KEY,
  tenant_id UUID NOT NULL,
  restaurant_id UUID REFERENCES restaurants(id),
  staff_member_id UUID,
  start_time TIMESTAMP,
  end_time TIMESTAMP,
  -- ... other fields
);

CREATE INDEX idx_shifts_tenant ON shifts(tenant_id);

Advantages:

Simple to implement initially
Easy to add new tenants—no infrastructure changes needed
Straightforward database migrations affect all tenants at once
Cross-tenant analytics and reporting are trivial

Disadvantages:

One tenant's bad query affects everyone
Data isolation depends entirely on application logic
Difficult to tune performance for tenants with vastly different usage patterns
Database backups and restores affect all tenants
Index bloat as total data grows

This is where we started with BistroBee, and it served us well initially.

Approach 2: Shared Database, Separate Schemas

Each tenant gets their own schema within a shared database:

-- Tenant 1
CREATE SCHEMA tenant_abc123;
CREATE TABLE tenant_abc123.restaurants (...);
CREATE TABLE tenant_abc123.shifts (...);

-- Tenant 2  
CREATE SCHEMA tenant_xyz789;
CREATE TABLE tenant_xyz789.restaurants (...);
CREATE TABLE tenant_xyz789.shifts (...);

Advantages:

Better logical isolation than shared schema
Per-tenant query performance tuning
Easier to understand data boundaries
Can restore individual tenant data more easily

Disadvantages:

Database migrations require running against every schema
Cross-tenant queries become more complex
Schema proliferation can cause operational challenges
Still subject to noisy neighbor problems at the database level

Approach 3: Database Per Tenant

Each tenant gets their own dedicated database:

// Connection routing logic
function getDatabaseConnection(tenantId) {
  const dbConfig = {
    'tenant-abc123': { host: 'db1.example.com', database: 'tenant_abc123' },
    'tenant-xyz789': { host: 'db2.example.com', database: 'tenant_xyz789' }
  };
  return createConnection(dbConfig[tenantId]);
}

Advantages:

Complete performance isolation
Simplified backup and restore per tenant
Easy to move individual tenants between servers
Natural data boundaries

Disadvantages:

Complex connection management
Database migrations across hundreds of databases
Higher infrastructure costs
Cross-tenant analytics require complex aggregation
Operational overhead scales linearly with tenant count

Each approach has advocates and detractors. The right choice depends heavily on your specific requirements, scale, and operational capabilities.

Our Journey: What Actually Happened

Phase 1: Shared Schema Success (Months 1-8)

We launched BistroBee with a straightforward shared schema approach. Every table had a tenant_id column, and our ORM automatically injected tenant filters into every query:

// Sequelize model with automatic tenant scoping
class Shift extends Model {}
Shift.init({
  id: { type: DataTypes.UUID, primaryKey: true },
  tenantId: { type: DataTypes.UUID, allowNull: false },
  restaurantId: DataTypes.UUID,
  staffMemberId: DataTypes.UUID,
  startTime: DataTypes.DATE,
  endTime: DataTypes.DATE
}, {
  sequelize,
  modelName: 'Shift',
  defaultScope: {
    // Automatically filter by tenant
    where: {
      tenantId: sequelize.literal('current_tenant_id()')
    }
  }
});

This worked beautifully for our first dozen customers. Development velocity was high, and database management was straightforward.

Phase 2: The Performance Wall (Months 9-12)

As we grew past 50 customers, problems emerged:

Problem 1: Index bloat
Our shifts table grew to millions of rows. Even with tenant_id indexes, queries were slowing down because the database had to scan through massive amounts of irrelevant data.

-- This query had to scan through all tenants' data
SELECT * FROM shifts 
WHERE tenant_id = 'abc-123' 
  AND start_time > '2024-01-01'
  AND staff_member_id = 'xyz-789';

-- Even with indexes, the query planner struggled
EXPLAIN showed sequential scans despite proper indexing

Problem 2: The noisy neighbor
One restaurant chain with 50 locations was processing vastly more data than our typical 1-2 location customers. Their bulk operations would slow the entire system.

Problem 3: Data skew
Our largest tenant had 100x more data than our smallest. Query optimization that worked well for small tenants performed terribly for large ones, and vice versa.

Problem 4: Connection pool exhaustion
During peak hours, we'd hit PostgreSQL connection limits because every request needed a connection regardless of which tenant it served.

Phase 3: The Failed Attempt (Months 13-15)

We decided to move to database-per-tenant. In hindsight, this was an overreaction to performance problems that could have been solved other ways.

Implementation nightmare:

// Connection management became a nightmare
class ConnectionPool {
  constructor() {
    this.pools = new Map();
  }

  getConnection(tenantId) {
    if (!this.pools.has(tenantId)) {
      // Create new connection pool on-demand
      const config = this.getTenantConfig(tenantId);
      this.pools.set(tenantId, new Pool(config));
    }
    return this.pools.get(tenantId);
  }

  // Memory leaks lurked here - we never properly cleaned up pools
}

Migration hell:
Running database migrations across 100+ databases was complex and error-prone. We built tooling to manage it, but the operational overhead was substantial:

# Our migration script
for tenant_db in $(list_tenant_databases); do
  echo "Migrating $tenant_db..."
  DATABASE_URL=$tenant_db npm run migrate
  if [ $? -ne 0 ]; then
    echo "FAILED: $tenant_db"
    # Now what? Rollback? Continue? 
  fi
done

Cost explosion:
Infrastructure costs increased dramatically. We needed connection pools for each database, backup storage multiplied, and monitoring became exponentially more complex.

After three months of struggling with this approach, we made the difficult decision to abandon it and find a hybrid solution.

Phase 4: The Hybrid Approach (Months 16-present)

We landed on an approach that combines the best elements of each strategy:

Tier-based architecture:

Small tenants (1-3 locations): Shared database, shared schema with partitioning
Medium tenants (4-20 locations): Shared database, dedicated schema
Large tenants (20+ locations): Dedicated database

This lets us optimize for the characteristics of each tenant segment.

Technical Implementation: What We Actually Built

Schema Design with Partitioning

For shared schema tenants, we implemented PostgreSQL table partitioning:

-- Parent table
CREATE TABLE shifts (
  id UUID NOT NULL,
  tenant_id UUID NOT NULL,
  restaurant_id UUID,
  staff_member_id UUID,
  start_time TIMESTAMP,
  end_time TIMESTAMP,
  created_at TIMESTAMP DEFAULT NOW()
) PARTITION BY LIST (tenant_id);

-- Create partition for each tenant
CREATE TABLE shifts_tenant_abc123 
PARTITION OF shifts 
FOR VALUES IN ('abc-123');

CREATE TABLE shifts_tenant_xyz789
PARTITION OF shifts
FOR VALUES IN ('xyz-789');

-- Indexes on partitions, not parent
CREATE INDEX idx_shifts_abc123_start_time 
ON shifts_tenant_abc123(start_time);

This gave us the operational simplicity of shared schema with much better query performance—queries only scan relevant partitions.

Intelligent Connection Routing

We built a connection router that directs queries to the appropriate database:

class DatabaseRouter {
  constructor() {
    this.sharedPool = new Pool(sharedDbConfig);
    this.dedicatedPools = new Map();
  }

  async getConnection(tenantId) {
    const tenant = await this.getTenantConfig(tenantId);

    if (tenant.tier === 'dedicated') {
      if (!this.dedicatedPools.has(tenantId)) {
        this.dedicatedPools.set(
          tenantId, 
          new Pool(tenant.dbConfig)
        );
      }
      return this.dedicatedPools.get(tenantId);
    }

    // Use shared pool for small/medium tenants
    return this.sharedPool;
  }

  // Cleanup unused connections periodically
  async cleanup() {
    for (const [tenantId, pool] of this.dedicatedPools) {
      const lastUsed = this.getLastUsedTime(tenantId);
      if (Date.now() - lastUsed > IDLE_TIMEOUT) {
        await pool.end();
        this.dedicatedPools.delete(tenantId);
      }
    }
  }
}

Migration Framework

We built tooling to handle migrations across our heterogeneous database setup:

class MigrationRunner {
  async runMigration(migration) {
    // Run on shared database
    await this.runOnDatabase(sharedDb, migration);

    // Run on each dedicated database
    for (const tenantId of this.getDedicatedTenants()) {
      try {
        const db = await this.getConnection(tenantId);
        await this.runOnDatabase(db, migration);
        await this.recordSuccess(tenantId, migration);
      } catch (error) {
        await this.recordFailure(tenantId, migration, error);
        // Continue with other tenants
      }
    }

    // Generate report of migration status
    return this.getMigrationReport(migration);
  }
}

Tenant Migration Strategy

We needed a way to migrate tenants between tiers as they grew:

class TenantMigrator {
  async promoteToDedicated(tenantId) {
    // 1. Provision new dedicated database
    const newDb = await this.provisionDatabase(tenantId);

    // 2. Copy data from shared to dedicated
    await this.copyTenantData(tenantId, sharedDb, newDb);

    // 3. Enable dual-writes (to both databases)
    await this.enableDualWrites(tenantId);

    // 4. Wait for replication to catch up
    await this.waitForSync(tenantId);

    // 5. Switch reads to new database
    await this.cutoverReads(tenantId);

    // 6. Disable writes to old database
    await this.disableDualWrites(tenantId);

    // 7. Clean up old data after safety period
    await this.scheduleCleanup(tenantId, sharedDb);
  }
}

Restaurant-Specific Database Challenges

Beyond general multi-tenancy concerns, restaurant software has domain-specific challenges:

Time Zone Complexity

Restaurants operate in specific time zones, but corporate users might view data across multiple time zones:

-- Bad: storing local times without timezone awareness
CREATE TABLE shifts (
  start_time TIMESTAMP,  -- Which timezone?
  end_time TIMESTAMP
);

-- Better: always use timezone-aware timestamps
CREATE TABLE shifts (
  start_time TIMESTAMPTZ,
  end_time TIMESTAMPTZ,
  timezone VARCHAR(50)  -- Store restaurant's timezone
);

-- Query helper
SELECT 
  start_time AT TIME ZONE restaurants.timezone AS local_start_time,
  start_time AT TIME ZONE 'UTC' AS utc_start_time
FROM shifts
JOIN restaurants ON shifts.restaurant_id = restaurants.id;

High-Frequency Transaction Data

Point-of-sale transactions generate enormous data volumes:

// Separate hot and cold data
const OrderSchema = {
  // Hot: current/recent orders
  orders: {
    tableName: 'orders',
    // Partition by created_at for efficient archival
  },

  // Cold: historical orders
  orders_archive: {
    tableName: 'orders_archive',
    // Moved here after 90 days
  }
};

// Automatic archival process
async function archiveOldOrders() {
  await db.query(`
    INSERT INTO orders_archive 
    SELECT * FROM orders 
    WHERE created_at < NOW() - INTERVAL '90 days'
  `);

  await db.query(`
    DELETE FROM orders 
    WHERE created_at < NOW() - INTERVAL '90 days'
  `);
}

Menu and Pricing Historization

Restaurants need to track menu changes over time for accurate historical reporting:

-- Naive approach: updates lose history
CREATE TABLE menu_items (
  id UUID PRIMARY KEY,
  name VARCHAR(255),
  price DECIMAL(10,2),
  updated_at TIMESTAMP
);

-- Better: maintain price history
CREATE TABLE menu_items (
  id UUID PRIMARY KEY,
  name VARCHAR(255),
  current_price_id UUID REFERENCES prices(id)
);

CREATE TABLE prices (
  id UUID PRIMARY KEY,
  menu_item_id UUID REFERENCES menu_items(id),
  price DECIMAL(10,2),
  valid_from TIMESTAMP,
  valid_until TIMESTAMP,  -- NULL for current price
  created_by UUID
);

-- Query current price
SELECT p.price FROM menu_items m
JOIN prices p ON m.current_price_id = p.id
WHERE m.id = 'item-id';

-- Query historical price
SELECT p.price FROM prices p
WHERE p.menu_item_id = 'item-id'
  AND p.valid_from <= '2024-01-15'
  AND (p.valid_until IS NULL OR p.valid_until > '2024-01-15');

Reservation and Booking Complexity

Table management requires handling complex constraints:

CREATE TABLE table_bookings (
  id UUID PRIMARY KEY,
  tenant_id UUID,
  table_id UUID,
  booking_time TIMESTAMPTZ,
  duration_minutes INTEGER,
  party_size INTEGER,
  status VARCHAR(50),

  -- Constraint: prevent double-booking
  EXCLUDE USING gist (
    table_id WITH =,
    tstzrange(
      booking_time, 
      booking_time + (duration_minutes || ' minutes')::INTERVAL
    ) WITH &&
  ) WHERE (status != 'cancelled')
);

Performance Lessons Learned

Lesson 1: Index Strategy Matters More Than You Think

We initially added indexes liberally. This caused problems:

-- Bad: too many indexes slow down writes
CREATE INDEX idx_shifts_tenant ON shifts(tenant_id);
CREATE INDEX idx_shifts_restaurant ON shifts(restaurant_id);
CREATE INDEX idx_shifts_staff ON shifts(staff_member_id);
CREATE INDEX idx_shifts_start ON shifts(start_time);
CREATE INDEX idx_shifts_end ON shifts(end_time);
CREATE INDEX idx_shifts_created ON shifts(created_at);

-- Better: composite indexes for common query patterns
CREATE INDEX idx_shifts_tenant_start 
ON shifts(tenant_id, start_time) 
INCLUDE (restaurant_id, staff_member_id);

CREATE INDEX idx_shifts_staff_date 
ON shifts(staff_member_id, start_time)
WHERE end_time IS NULL;  -- Partial index for active shifts

Lesson 2: Connection Pooling is Critical

We learned this the hard way:

// Bad: new connection per request
app.use(async (req, res, next) => {
  req.db = await createConnection(dbConfig);
  next();
});

// Better: shared pool with limits
const pool = new Pool({
  max: 20,  // Maximum connections
  min: 5,   // Minimum connections
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,

  // This saved us during traffic spikes
  maxWaitingClients: 100
});

Lesson 3: Query Timeouts Prevent Cascading Failures

One slow query shouldn't take down the whole system:

// Set statement timeout for all queries
await client.query('SET statement_timeout = 5000');  // 5 seconds

// Or per-query timeout
const result = await client.query({
  text: 'SELECT * FROM complex_report WHERE...',
  timeout: 10000  // 10 seconds for this specific query
});

Security and Data Isolation

Multi-tenant databases require extra security vigilance:

Row-Level Security

PostgreSQL's RLS provides an extra safety layer:

-- Enable RLS
ALTER TABLE shifts ENABLE ROW LEVEL SECURITY;

-- Policy: users can only see their tenant's data
CREATE POLICY tenant_isolation ON shifts
FOR ALL
USING (tenant_id = current_setting('app.current_tenant_id')::UUID);

-- Set tenant context at connection time
SET app.current_tenant_id = 'abc-123';

Application-Level Safeguards

Defense in depth requires application-level checks too:

class TenantAwareModel {
  static async find(id, tenantId) {
    // Always include tenant_id in where clause
    const result = await db.query(
      'SELECT * FROM shifts WHERE id = $1 AND tenant_id = $2',
      [id, tenantId]
    );

    if (!result.rows.length) {
      throw new NotFoundError('Shift not found');
    }

    return result.rows[0];
  }
}

Audit Logging

Track all cross-tenant access attempts:

async function queryWithAudit(sql, params, tenantId) {
  const startTime = Date.now();

  try {
    const result = await db.query(sql, params);

    // Log successful queries
    await auditLog.record({
      tenantId,
      sql: sql.substring(0, 1000),  // Truncate long queries
      duration: Date.now() - startTime,
      rowCount: result.rows.length
    });

    return result;
  } catch (error) {
    // Log failures too
    await auditLog.record({
      tenantId,
      sql: sql.substring(0, 1000),
      error: error.message,
      duration: Date.now() - startTime
    });
    throw error;
  }
}

Operational Realities

The technical design is only part of the story. Operations matter just as much:

Backup Strategy

Different tenants need different backup strategies:

// Small tenants: shared database backup
await backupDatabase(sharedDb, {
  frequency: 'daily',
  retention: '30 days'
});

// Large tenants: individual backups
for (const tenantId of largeTenants) {
  await backupDatabase(getTenantDb(tenantId), {
    frequency: 'hourly',
    retention: '90 days',
    pointInTimeRecovery: true
  });
}

Monitoring and Alerting

We monitor different metrics based on tenant tier:

// Shared database: watch for slow queries affecting multiple tenants
monitorSharedDb({
  slowQueryThreshold: 500,  // ms
  alertOn: (query) => {
    return query.scansMultiplePartitions;
  }
});

// Dedicated databases: watch for individual tenant issues
for (const tenantId of dedicatedTenants) {
  monitorTenantDb(tenantId, {
    connectionPoolUtilization: 0.8,
    queryTimeoutRate: 0.01,
    diskUtilization: 0.75
  });
}

Conclusion: There Is No Perfect Solution

After two years of evolution, our hybrid approach works well for BistroBee, but it's not perfect. We still face challenges:

Database migrations are complex
Connection management requires vigilance
Deciding when to migrate tenants between tiers involves judgment calls
Operational overhead is higher than a pure shared-schema approach

But the tradeoffs are worth it. We can offer appropriate isolation and performance to each customer segment without the operational burden of treating every tenant identically.

The biggest lesson: multi-tenant database design is about understanding your customers' needs and usage patterns, then choosing architecture that serves those needs—not about finding the theoretically "best" approach.

For BistroBee, that meant accepting complexity in exchange for flexibility. For your application, the right tradeoffs might be different.

The key is making those decisions deliberately, understanding the consequences, and building systems that can evolve as your understanding improves.

What's your experience with multi-tenant database design? What approaches have worked well for your applications, and what unexpected challenges have you encountered? Share your experiences in the comments.

Top comments (1)

Alex Danilov • Oct 17

Thanks for sharing your experience! This was really helpful!