DEV Community

Istvan Szabo
Istvan Szabo

Posted on

Multi-site schemas: lessons from evolving one large system over time

Multi-site schemas: lessons from evolving one large system over time

I worked on a large multi-site system where the data model had to evolve as the product grew.

It started out simple, but as always, with progress the demand also grew to something that allowed flexibility.

The progression was:

  1. site_id
  2. many-to-many relationships
  3. scope_id

Each stage solved the limitations of the previous one.

Looking back, I would skip directly to the final model.


1. Start simple: site_id

The original model was:

products
- id
- site_id
- name
Enter fullscreen mode Exit fullscreen mode

Queries were simple:

WHERE site_id = ?
Enter fullscreen mode Exit fullscreen mode

In practice, the active site was usually resolved from the request domain.

Examples:

  • us.example.com → site 1
  • eu.example.com → site 2

So each request enters the system with a current site_id, and that drives filtering, permissions, and configuration.

This model is easy to understand, easy to query, and operationally simple.


Where it stopped being enough

Once entities needed selective sharing across sites, site_id became restrictive.

Typical workaround:

site_id = 0
Enter fullscreen mode Exit fullscreen mode

Meaning global.

That handles:

  • one site
  • all sites

But not:

  • site A + B only
  • region-specific sharing
  • custom groups of sites

2. Add relationships: many-to-many

The next idea was relationship tables.

products
- id
- name

product_sites
- product_id
- site_id
Enter fullscreen mode Exit fullscreen mode

Now entities could belong to any set of sites.

That solved the sharing problem.


Why I turned away from this approach

Only some data benefits from many-to-many visibility.

Examples:

Good candidates:

  • products
  • templates
  • assets

Poor candidates:

  • orders
  • logs
  • analytics events
  • transactions

So the system would be using two patterns:

WHERE site_id = ?
Enter fullscreen mode Exit fullscreen mode

and

JOIN product_sites ...
Enter fullscreen mode Exit fullscreen mode

That inconsistency spreads over time.

Different repositories, query builders, exports, reports, and jobs all need to know which model applies.


3. The real requirement was visibility

The actual question most features were asking was not:

Which site owns this?

It was:

Which sites can access this?

That is a visibility problem.

Visibility is naturally represented as a reusable set.


4. Introduce scope_id

Instead of attaching sites directly to every entity, entities reference a scope.

- id
- scope_id
- name
Enter fullscreen mode Exit fullscreen mode
scopes
- id
- hash
Enter fullscreen mode Exit fullscreen mode
scope_members
- scope_id
- site_id
Enter fullscreen mode Exit fullscreen mode

With:

scope_id = 0
Enter fullscreen mode Exit fullscreen mode

Reserved for global visibility.


What a scope represents

Examples:

  • 0 → visible everywhere
  • scope 12 → site [1]
  • scope 27 → sites [1,2,5]

The entity only stores visibility.

No direct ownership rules. No per-entity join tables.


5. Why this improved the system

The biggest benefit was consistency.

Reads normalize to one pattern:

WHERE scope_id IN (0, ...)
Enter fullscreen mode Exit fullscreen mode

Where the allowed scopes for the active site are resolved once and cached.

That means:

  • one read model
  • no switching between joins and direct filters
  • no special handling for global records
  • simpler shared infrastructure code

6. Writes become explicit

Visibility is assigned intentionally:

scope_id = Scope::forSite(1);

scope_id = Scope::forSites([1,2,5]);

scope_id = 0;
Enter fullscreen mode Exit fullscreen mode

That centralizes the logic into one helper layer.


7. Why duplication becomes expensive

A common workaround for multi-region systems is cloning records into each site.

That seems fine initially.

Later it creates ongoing problems:

  • global identifiers
  • syncing edits across copies
  • partial overrides
  • having to worry about the source of truth
  • larger data size that could directly affect performance
  • rollout coordination

At scale, managing copies becomes harder than managing access.

A shared entity with scoped visibility avoids most of that.

One identity, multiple sites.


8. What I would do now

If I were building this again, I would start with scope_id.

It is close enough to site_id in simplicity:

scope_id = Scope::forSite(1)
Enter fullscreen mode Exit fullscreen mode

But it avoids the later refactor when sharing becomes necessary.

Single-site behavior still works naturally.

Shared visibility is already built in.

Global visibility already exists.

There is little benefit in starting with a narrower model if the broader one carries almost no real penalty.


Final thought

The strongest argument for scope_id is not just flexibility.

It is that it handles:

  • isolated sites
  • shared entities
  • global entities
  • future growth

using one consistent model.

If a design solves today’s requirements and tomorrow’s likely requirements without meaningful tradeoffs, it is usually the better starting point.

Top comments (0)