muthandir

Posted on Mar 24, 2022

How do you handle the data ownership of a multi-tenant product?

#saas

Multi tenancy is very crucial for most of the SAAS products and I think, data ownership is one of the most important aspects of multi tenancy. So how do you deal with the data ownership?

(Note: by "data ownership", I mean:
-customer1 can only read customer1's data (and can never access customer2's data) or
-customer1 can only upsert data for customer1.

or in an enterprise situation,
-user1 of customer1 can only read customer1's data (and can never access customer2's data) or
-user1 of customer1 can only upsert data for customer1

Top comments (5)

Kasey Speakman • Mar 25 '22

There's a primer here.

You can choose isolated vs shared at different levels, depending on what's important to your customers. In one app, we do schema isolation. For each tenant we create a schema and a db user with access to only that schema. Both of these match the tenant id. Before we run any query, we get the tenant associated with the user and switch to the tenant's db user/schema. So no cross-tenant access is possible. But this is a pain when you need to query across tenants. Example: to get billable usage by tenant, have to run a query per schema and union them together. And we have to run a provisioning process before new tenants can access it.

If you go with full isolation, you can customize code, data structures, operational guarantees, etc. per customer. But it's the most expensive way -- you have to maintain each customized version. And resources can sit idle (wasted cost) much of the time.

Shared (pooled) resources are the most economical, but data isolation is more of a challenge. Examples: row-level security, but it may affect data design. Encrypt each tenant's data with a separate key. Bake tenant filtering into the code infrastructure.

Things get more complicated when arbitrary connections between entities allow access. Such as an accepted friend request. You might use make-shift graph tables and recursive CTEs in small apps or a graph database in larger apps to trace access. Example: Alice -(friend)-> Bob -(post)->Vacation -(attachment)-> Photo123.jpg. Found a path, so Alice is allowed to open the photo.

Tobias Haindl • Mar 25 '22

A great summary of different approaches can be found here.
I also liked the simple explanation from this stackoverflow answer.
Keep in mind that the best solution also depends on your database product.
E.g. in Postgres multi tenancy can be implemented via row level security (more information here).

Thorsten Hirsch • Mar 25 '22

I started with complete isolation, but our customers then had the need to share data. So now I have two kinds of components:

component A, which is specific to each tenant (full isolation) -> every tenant has his own instance of component A
components B and C, shared between all tenants -> there's just a single instance of B and C

The permissions of all users are held in component B, a shared one. Why? Because it enables super users which have write permissions not only in their instance of A, but also on B and C. But the default user permissions are what you describe (write permissions in their instance of A).

muthandir • Mar 25 '22 • Edited

Great documents, let me also share an article from Michael Beardsley. In my case I think there is a challenging complexity, (lemme see if I can describe it properly)

As Kasey mentions it is about the shared access (I liked the graph idea btw).

Personas:
-Company employees
-Travel Agency Consultants
-Hotel Front Desk Team

Company employees can create bookings.
Travel Consultants can review and make modifications on the bookings.
Hotels can review and make modifications on the bookings.

booking1 company-A travel-agency-B hotel-C
booking2 company-B travel-agency-C hotel-C
booking3 company-C travel-agency-B hotel-D

hotel-C can read booking1 and booking2,
travel-agency-B can read booking1 and booking3,
company-A can read booking1 only.

I think this is a fairly complex use case, and in this scenario, I've gone with 3 microservices (companyApi, agencyApi, hotelApi) that pass different tenancy information stored on different columns in the DB. Kind of similar to Michael Beardsley's post above (section: Alternative approach):

If you don’t want to create and maintain PostgreSQL users for each of your tenants, you can still use a shared PostgreSQL login for your application. However, you need to define a runtime parameter to hold the current tenant context of your application.

muthandir • Mar 24 '22 • Edited

I'll go first. So I have this SAAS project, I use node.js and I want to use sequelize for the ORM solution. But I don't want to expose the concrete ORM selection to my colleagues. So I created an NPM package, which internally uses sequelize and exposes sql manipulation functionalities using sequlize as an internal dependency. This library injects customer_id in all type of sql statements sequelize creates. Something like:
select * from products where user_id=3 and customer_id=4 <= customer_id gets inserted by our in-house library.
or insert into products (x,y,z, customer_id) values (1,2,3,4)<= customer_id added by our in-house library
or update products set x=y where id=xyz and customer_id=4 <= customer_id added by our in-house library
Note: This is a fairly simplified summary of what the library does.