DEV Community

Mustafa ERBAY
Mustafa ERBAY

Posted on • Originally published at mustafaerbay.com.tr

Multi-Tenant ERP: The Risks of a Shared Schema

Why Is the Shared Schema Approach Risky in Multi-Tenant ERPs?

Imagine a scenario in a production ERP where, instead of setting up isolated data structures on the same database for different clients, we try to manage all clients' data within a single database but through different tables or columns. This forms the basis of the "shared schema" approach. While this approach initially offers attractive benefits like efficient resource utilization and ease of management, it harbors serious risks, especially in enterprise-level ERP systems dealing with sensitive data. My years of experience in system architecture and database management show that this kind of "cheap" solution can cost much more in the long run.

In this post, we will take a deep dive into why the shared schema approach is a trap, what technical and security issues it causes, using real-world scenarios and concrete data. We will also discuss which alternative architectural approaches are more secure and scalable to avoid these risks. My goal is to provide an analysis so clear that it will make you say "that's exactly why I shouldn't have chosen this path" when you encounter it.

The Appeal of Shared Schema and Its Initial Advantages

At its simplest, shared schema architecture means that data from multiple "tenants" (clients, user groups, or departments) is stored on the same database server, usually within the same database but in different tables or using an additional tenant ID column in the same tables. The first advantages of this model that come to mind are:

  • Resource Optimization: A single database server and a single database instance may require fewer hardware resources (CPU, RAM, disk) compared to multiple database instances. This can be attractive especially for projects looking to lower initial costs.
  • Ease of Management: Backing up, maintaining, and applying updates to a single database instance is simpler than managing hundreds or thousands of separate database instances. Automation scripts become less complex.
  • Database Cost: If licensing costs (especially in commercial databases) are calculated based on the number of database instances, shared schema can significantly reduce costs.
  • Development Speed (Initial Phase): For the application development team, working on a single database schema can start faster because they don't have to deal with the complexity of setting up and managing a separate database infrastructure for each tenant.

Once upon a time, I considered this kind of approach for small-scale projects or internal tools. For example, managing user data for my personal blog or a simple task management tool in a single database seemed more practical than spinning up a separate PostgreSQL instance. However, there were critical factors I overlooked or didn't value enough when making this decision, which can lead to absolute disaster scenarios for enterprise-level ERP applications.

ℹ️ What Is a Tenant?

In multi-tenant architecture, a "tenant" refers to an independent customer, organization, or user group using the software. Each tenant is expected to have their own data and settings.

Data Isolation Issues: The Security and Compliance Dilemma

The biggest weakness of the shared schema is the difficulty of ensuring data isolation between tenants. When we say "shared schema", two main approaches come to mind:

  1. Shared Database, Shared Schema: All tenants' data is kept in the same tables within the same database. Every table has a tenant_id column, and queries are filtered according to this tenant_id.
  2. Shared Database, Separate Schemas: All tenants live on the same database server, but a separate database schema (as in PostgreSQL) is created for each tenant.

The first approach (shared schema) is the riskiest. It is the application's responsibility to correctly specify the tenant_id column in every query. If a developer misses this filtering or accidentally writes a query that can access another tenant's data, a data privacy breach is inevitable.

For example, in an e-commerce ERP, when you want to list Customer A's orders, the application needs to run the following query in the background:

SELECT * FROM orders WHERE tenant_id = 'müşteri_a_id';
Enter fullscreen mode Exit fullscreen mode

If the developer forgets the WHERE tenant_id = 'müşteri_a_id' part or accidentally writes WHERE tenant_id = 'müşteri_b_id', this leads to the leakage of the other customer's sensitive order information. Such mistakes can happen frequently, especially in large teams and fast development cycles. In my own experience, while developing an ERP for a manufacturing firm about 2 years ago, an operator screen was supposed to show only the production orders of their own department. However, due to a bug, they could see the production orders of the entire factory. This situation arose from the mistake of a few developers and was caused by the lack of user_id filtering. This was a simple omission of tenant_id, and its consequences were a serious data privacy risk.

Additionally, data protection regulations like GDPR or KVKK mandate the isolation of personal data and its protection against unauthorized access. Achieving this compliance in the shared schema model means ensuring every query is correctly filtered, which is highly prone to human error.

⚠️ Data Isolation and Security Risks

Data isolation in a shared schema is heavily dependent on application logic. A single mistake can cause the accidental exposure of all customers' data. This is an unacceptable risk from both a security and regulatory compliance perspective.

Performance and Scalability Constraints

Another major issue with the shared schema is its constraints on performance and scalability. As the number of tenants increases or the data volume of individual tenants grows, read/write operations concentrated on the same database and tables lead to serious performance bottlenecks.

For example, let's assume a table contains millions of rows, and each of these rows belongs to different tenants. Even if indexing is done on tenant_id in this table, queries will generally have to scan this massive index. A query from one tenant can cause I/O operations that could affect the data of other tenants.

💡 Indexing and Performance

Adding an index to the tenant_id column in PostgreSQL can speed up queries. However, having tenant_id as an index on its own does not guarantee that queries will only retrieve data belonging to that tenant. It is a prerequisite that the query contains the WHERE tenant_id = ... condition. Also, it should not be forgotten that the index will grow as the table grows.

Once, while working on the order management module of a large e-commerce platform, I came across a structure that kept data for different sellers within the same database. We observed that when a new seller joined the system or the transaction volume of existing sellers increased, the entire system slowed down. Even though there was an index on seller_id in the orders table, the overall performance of the queries was dropping. The reason was not only that queries retrieved only the relevant seller's data, but also that the overall load on the database server increased. A heavy transaction performed by one seller affected the transactions of other sellers as well. This situation clearly demonstrated the scalability limits of the shared schema.

Furthermore, a heavy operation performed by a tenant in the database (for example, running a large report or bulk data update) can block or slow down the operations of all other tenants. This is known as the "noisy neighbor" problem and is inherent in shared resources.

🔥 Noisy Neighbor Problem

Excessive resource utilization by one tenant negatively affects the performance of all other tenants sharing the same database and server. This can severely degrade the user experience and threaten business continuity.

Management and Maintenance Challenges

Although it offers ease of management at first glance, the shared schema actually creates serious maintenance and operational challenges in the long run.

Top comments (0)