Design Data Model for Multi-Tenant RBAC

#erd #database #schema #databasedesign

Part 1: Identity Foundation

Behavior Description

Assume we are building a KPI management platform. Multiple companies can create accounts (tenants) and let their employees access the platform.

A user can be assigned to one or several roles. Their permissions are the sum of those roles' permissions.
A user cannot access features outside their permissions. Even if they know the API specs and bypass the UI, they will still be blocked at the API layer.
Admins can change a user's roles or a role's permissions at any time. The only inconvenience is that the affected user needs to log out and log back in for the changes to take effect.

Sounds straightforward, right? Let's look at what happens under the hood.

Deriving the Data Model from Use Cases

Before implementing, I take time to think about what the system needs to do and what data structures are required to support those use cases.

The first thing we hear in the spec is "multiple tenants." A system that serves multiple companies and their users. That gives us at least two objects to start with: tenants and users.

Note: I use the plural convention for table names throughout this post.

Defining the Relationship Between Tenants and Users

Now we have the foundation, but the relationship between tenants and users is not yet defined. To define it, we need to find the right constraint. There are two cases:

Case 1: One tenant can have many users. One user must use a unique email per tenant.
Case 2: One tenant can have many users. One user can use the same email across different tenants.

Consider this scenario: you are a contractor working for two different companies that both use this platform. If we go with Case 1, the best you can do is maintain two separate emails for two separate tenants on the same system. That is a real inconvenience.

Notice I used the word person instead of user here. Technically, every tenant is independent. There is no good reason to force a person to use a different email just to become a user of another tenant.

Case 2 is the preferred design.

Tenant Identification at Login

This raises a natural question: if a user can log in with the same email across multiple tenants, how do we know which tenant they belong to?

The answer is that when a user logs in, we need to identify the tenant. There are a few approaches:

Ask the user to type a tenant code in the login form. Bad idea.
Provide a typeahead field to help the user find their tenant code. Still a bad idea.
Give each tenant a subdomain or sub-path, such as tenant1.ourplatform.com or ourplatform.com/tenant1. This is the widely accepted approach, especially the subdomain variant. The only tradeoff is that instead of a standard SSL certificate, you need a wildcard SSL certificate.

The Identity Table

Another natural question follows: if users share an email across tenants, can they also share a password?

No. Password policies can differ per tenant, so sharing passwords across tenants is not acceptable. This leads us to a third object: the identities table. An identity represents a user's membership in a specific tenant, containing tenant's specific information.

The relationship is clear: one user can have many identities, and each identity belongs to exactly one tenant. With this simple three-table model - tenants, users, and identities - we satisfy the core constraint cleanly: a person can use the same email across multiple independent tenants.

Recognizing the Many-to-Many Relationship

Look closely at the diagram. Both the N (of 1:N) ends from users and tenants point to the identities table. This is a sign that users and tenants have an N:N relationship - a user can belong to multiple tenants, and a tenant can have multiple users. The identities table is effectively the join table.

With that insight, we can express the data model more precisely: