DEV Community

Cover image for I Built a SaaS on Top of an Open-Source Project and My Database Punished Me for It
Black Lover
Black Lover

Posted on

I Built a SaaS on Top of an Open-Source Project and My Database Punished Me for It

I didn't design my database from scratch.

That's the honest truth. When I decided to build my WhatsApp Business API SaaS — targeting digital marketing agencies in India — I didn't start with a blank Postgres schema and a whiteboard. I started with a fork. Specifically, I forked Whatomate, an open-source Go-based WhatsApp platform, and told myself: "The hard part is done. I just need to add billing and org management on top."

That was about six months ago. Since then, my database has quietly humiliated me in ways I didn't expect. Not dramatically — not with crashes or data loss or anything catastrophic. Just slowly, consistently, in the way that technical debt humiliates you: by making every new feature take twice as long as it should, and by making you feel stupid for not seeing it coming.

This is that story.


Who This Is For

If you've ever:

  • Forked an open-source project and tried to commercialize it
  • Inherited a codebase and tried to add multi-tenancy to it
  • Built a SaaS and skipped the "proper" database design phase because you were in a hurry

...then some version of what happened to me has probably happened to you too, or will. This post is the thing I wish I'd read before I started.


The Starting Point: What I Forked and Why

Whatomate is a self-hosted WhatsApp Business API platform built in Go. It handles the gnarly parts of WhatsApp integration: message queuing, webhook management, media handling, contact synchronization with the WhatsApp API, campaign scheduling. It's genuinely good software — clean code, well-structured, production-tested.

The go-to-market thesis was simple: Indian digital marketing agencies spend thousands of rupees per month on SaaS WhatsApp tools like Interakt, Wati, and AiSensy. Most of those tools have the same core feature set. The differentiation is mostly UI and pricing. If I could fork a solid open-source foundation, add a proper multi-tenant billing layer, and undercut on price while matching features — there was a business there.

What I didn't account for was that the "add a billing layer" step wasn't additive. It was transformative. The business model I was building required a fundamentally different data model than the one I forked.

I would not discover this until I was already six weeks in.


The Implicit Contract I Didn't Notice

The original codebase was built with a very specific assumption baked deep into every table, every query, every foreign key relationship:

One installation = one organization.

It wasn't documented anywhere. It didn't need to be. The developer who built it was building a self-hosted tool. One team, one server, one set of WhatsApp accounts. The schema reflected that beautifully — clean, fast, well-indexed for that exact use case.

When I forked it, I inherited that assumption without realizing it.

The first time I noticed was when I started designing the org management module for SaaS mode. I went to look at where the users table connected to everything else. And I realized: it connected to everything. Users owned campaigns. Users owned contacts. Users owned message templates. Users owned WhatsApp account credentials. There was no middle layer — no organization, no tenant_id, no concept that multiple companies might share the same installation.

-- What the original schema looked like (simplified)
users
  id, email, password_hash, created_at

campaigns
  id, user_id, name, status, scheduled_at
  -- user_id FK → users.id

contacts
  id, user_id, phone, name, tags
  -- user_id FK → users.id

whatsapp_accounts
  id, user_id, phone_number, access_token, webhook_url
  -- user_id FK → users.id

message_templates
  id, user_id, name, body, language_code
  -- user_id FK → users.id
Enter fullscreen mode Exit fullscreen mode

The database was built for a world where that question never needed to be asked. I was now asking it at production scale. Or at least, staging scale — which felt like production at 11pm with chai going cold.


What Happened When I Started Adding org_id Everywhere

My first instinct was simple: add an org_id column to every table. Slap a foreign key on it. Done.

I started writing the migration. About 20 tables in, I stopped.

The problem wasn't the columns. It was the queries.

The existing Go code had hundreds of database calls — beautifully written, fast, using GORM v2 — and none of them knew org_id existed. Every query fetched data scoped to a user. Not a tenant. A user.

// What the existing query layer looked like (simplified)
func GetCampaigns(userID uint) ([]Campaign, error) {
    var campaigns []Campaign
    result := db.Where("user_id = ?", userID).Find(&campaigns)
    return campaigns, result.Error
}

// What I needed for multi-tenancy
func GetCampaigns(orgID uint, userID uint) ([]Campaign, error) {
    var campaigns []Campaign
    result := db.Where("org_id = ? AND user_id = ?", orgID, userID).Find(&campaigns)
    return campaigns, result.Error
}
Enter fullscreen mode Exit fullscreen mode

If I added org_id to the tables but didn't touch the queries, I'd have a SaaS product where Agency A could theoretically access Agency B's contacts. Not because of a bug in my code. Because of a contract mismatch between the schema I was building and the queries that were already there.

The database wasn't wrong. The original code wasn't wrong. I was trying to make them do something they were never designed for.

The Full Scope of the Problem

Here's what the migration actually involved, once I mapped it out:

-- Step 1: Create the organizations table (didn't exist at all)
CREATE TABLE organizations (
    id          BIGSERIAL PRIMARY KEY,
    name        VARCHAR(255) NOT NULL,
    slug        VARCHAR(100) NOT NULL UNIQUE,
    plan_id     BIGINT REFERENCES plans(id),
    created_at  TIMESTAMP DEFAULT NOW(),
    updated_at  TIMESTAMP DEFAULT NOW()
);

-- Step 2: Add org_id to every table that needed tenant isolation
ALTER TABLE contacts          ADD COLUMN org_id BIGINT REFERENCES organizations(id);
ALTER TABLE campaigns         ADD COLUMN org_id BIGINT REFERENCES organizations(id);
ALTER TABLE message_templates ADD COLUMN org_id BIGINT REFERENCES organizations(id);
ALTER TABLE whatsapp_accounts ADD COLUMN org_id BIGINT REFERENCES organizations(id);
ALTER TABLE webhooks          ADD COLUMN org_id BIGINT REFERENCES organizations(id);
ALTER TABLE media_uploads     ADD COLUMN org_id BIGINT REFERENCES organizations(id);
ALTER TABLE api_keys          ADD COLUMN org_id BIGINT REFERENCES organizations(id);
ALTER TABLE audit_logs        ADD COLUMN org_id BIGINT REFERENCES organizations(id);
-- ... 14 more tables

-- Step 3: Create a default org for existing self-hosted data
INSERT INTO organizations (name, slug) VALUES ('Default', 'default');

-- Step 4: Backfill org_id on every existing row
UPDATE contacts          SET org_id = 1 WHERE org_id IS NULL;
UPDATE campaigns         SET org_id = 1 WHERE org_id IS NULL;
UPDATE message_templates SET org_id = 1 WHERE org_id IS NULL;
-- ... repeat for every table

-- Step 5: Add NOT NULL constraints only after backfill is complete
ALTER TABLE contacts          ALTER COLUMN org_id SET NOT NULL;
ALTER TABLE campaigns         ALTER COLUMN org_id SET NOT NULL;
ALTER TABLE message_templates ALTER COLUMN org_id SET NOT NULL;
-- ... repeat for every table

-- Step 6: Add composite indexes for tenant-scoped queries
CREATE INDEX idx_contacts_org_id          ON contacts(org_id);
CREATE INDEX idx_campaigns_org_id_status  ON campaigns(org_id, status);
CREATE INDEX idx_templates_org_id         ON message_templates(org_id);
-- ... repeat for performance-critical tables
Enter fullscreen mode Exit fullscreen mode

That's not a migration. That's a schema renegotiation — and it touches every service, every query, every test, every API handler. Two weeks I hadn't budgeted for. Two weeks that had a cascading effect on every deadline after them.


The Lesson That Slapped Me: Schema Is a Contract, Not Just Storage

I used to think of the database as a dumb storage layer. You put data in, you get data out, you add indexes when things get slow. The real logic lives in the application.

Building on top of someone else's schema broke that belief permanently.

The schema is the application in a lot of ways. The table names, the relationships, the nullable columns, the absence of certain foreign keys — all of it encodes decisions that were made about how the software would be used. When you change the use case without renegotiating the schema, you're making promises the database can't keep.

In my case, the original schema promised: "a user is the atomic unit of ownership."

My SaaS needed it to promise: "an organization is the atomic unit of ownership, and users belong to organizations."

Those are fundamentally different contracts. The entity-relationship model shifts like this:

Original schema:
User ──owns──> Campaigns
User ──owns──> Contacts
User ──owns──> Templates
User ──owns──> WhatsApp Accounts

SaaS schema:
Organization ──has──> Users (with roles: owner, admin, member)
Organization ──owns──> Campaigns
Organization ──owns──> Contacts
Organization ──owns──> Templates
Organization ──owns──> WhatsApp Accounts
User ──belongs to──> Organization
User ──can access──> Org's resources (filtered by role)
Enter fullscreen mode Exit fullscreen mode

I couldn't patch my way from one to the other. I had to sit down and think through which tables needed org_id, which could stay user-scoped, what the query layer needed to change, what new indexes I needed, and whether existing data would survive the migration cleanly — and then execute all of it before I could ship a single paying customer.

The Tables That Don't Need org_id (And Why That Matters)

Not everything needed org isolation, and figuring out which tables didn't was as important as figuring out which ones did. There are two categories of data that are genuinely cross-org:

Global/shared data — things that exist independently of any organization:

-- These stay without org_id
plans          -- Pricing tiers (Basic, Pro, Agency) — shared across all orgs
countries      -- Country codes for phone number validation
languages      -- Supported template languages (WhatsApp API constraint)
system_logs    -- Infrastructure-level logs, not org-specific
Enter fullscreen mode Exit fullscreen mode

Per-user data that isn't org-owned — things that belong to the person, not the company:

-- These stay user-scoped
user_sessions       -- Auth sessions belong to the person, not the org
user_preferences    -- UI settings, notification prefs
password_resets     -- Security flows are user-level
Enter fullscreen mode Exit fullscreen mode

Getting this wrong in either direction causes problems. If you put org_id on plans, you'd have to duplicate plan records per org. If you leave org_id off contacts, you've broken tenant isolation. Mapping out every table against this taxonomy before touching a single migration command saved me from at least three mistakes I can think of.


The Role System Nobody Warned Me About

Here's something the "just add org_id" advice leaves out completely: once you have organizations, you need roles. Not immediately — but the moment your first real customer asks "can I give my team member access without giving them owner permissions," you need roles. And if your schema isn't ready for it, you're doing another two-week refactor.

I decided to build the role system upfront this time, rather than discovering I needed it mid-flight.

-- Organization membership with roles
CREATE TABLE org_members (
    id          BIGSERIAL PRIMARY KEY,
    org_id      BIGINT NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    user_id     BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    role        VARCHAR(50) NOT NULL DEFAULT 'member',
    -- roles: 'owner' | 'admin' | 'member' | 'viewer'
    invited_by  BIGINT REFERENCES users(id),
    joined_at   TIMESTAMP,
    created_at  TIMESTAMP DEFAULT NOW(),
    UNIQUE(org_id, user_id)  -- A user can only have one role per org
);

CREATE INDEX idx_org_members_org_id  ON org_members(org_id);
CREATE INDEX idx_org_members_user_id ON org_members(user_id);
Enter fullscreen mode Exit fullscreen mode

The role definitions themselves live in the application layer:

// internal/auth/roles.go
type Role string

const (
    RoleOwner  Role = "owner"   // Full access, billing, delete org
    RoleAdmin  Role = "admin"   // Full access, no billing/delete
    RoleMember Role = "member"  // Create/edit campaigns and contacts
    RoleViewer Role = "viewer"  // Read-only access
)

type Permission string

const (
    PermCreateCampaign   Permission = "campaign:create"
    PermDeleteCampaign   Permission = "campaign:delete"
    PermManageMembers    Permission = "members:manage"
    PermViewBilling      Permission = "billing:view"
    PermManageBilling    Permission = "billing:manage"
    PermDeleteOrg        Permission = "org:delete"
    PermManageAPIKeys    Permission = "api_keys:manage"
)

var RolePermissions = map[Role][]Permission{
    RoleOwner: {
        PermCreateCampaign, PermDeleteCampaign,
        PermManageMembers, PermViewBilling, PermManageBilling,
        PermDeleteOrg, PermManageAPIKeys,
    },
    RoleAdmin: {
        PermCreateCampaign, PermDeleteCampaign,
        PermManageMembers, PermViewBilling,
        PermManageAPIKeys,
    },
    RoleMember: {
        PermCreateCampaign,
    },
    RoleViewer: {},
}

func (r Role) Can(p Permission) bool {
    perms, ok := RolePermissions[r]
    if !ok {
        return false
    }
    for _, perm := range perms {
        if perm == p {
            return true
        }
    }
    return false
}
Enter fullscreen mode Exit fullscreen mode

The middleware that enforces this runs on every protected route:

// middleware/auth.go
func RequirePermission(perm auth.Permission) gin.HandlerFunc {
    return func(c *gin.Context) {
        member := c.MustGet("org_member").(OrgMember)
        if !auth.Role(member.Role).Can(perm) {
            c.AbortWithStatusJSON(403, gin.H{
                "error": "insufficient_permissions",
                "required": perm,
            })
            return
        }
        c.Next()
    }
}

// Applied at route registration
campaigns := r.Group("/campaigns")
campaigns.Use(middleware.RequirePermission(auth.PermCreateCampaign))
campaigns.POST("/", handlers.CreateCampaign)
Enter fullscreen mode Exit fullscreen mode

None of this existed in the original codebase. The original had a simple admin boolean on the users table. That's fine for a self-hosted tool where there's one team. It's not enough for a SaaS where an agency owner needs to give their intern read-only access to campaign analytics.


The Feature Flag Hack That Saved Me (And The Technical Debt It Created)

Here's where I made a pragmatic call I'm still not sure was right.

I didn't want to break the original self-hosted use case. I'm contributing back to the open-source project — I care about that. So I introduced a mode config flag:

// config/config.go
type AppConfig struct {
    Mode string `env:"APP_MODE" default:"self-hosted"` // "saas" | "self-hosted"
    // ...
}

func IsSaaS() bool {
    return Get().Mode == "saas"
}

// Used throughout the service layer
func (s *CampaignService) GetCampaigns(ctx context.Context, userID uint) ([]Campaign, error) {
    if config.IsSaaS() {
        orgID := ctx.Value("org_id").(uint)
        return s.repo.GetByOrg(orgID, userID)
    }
    return s.repo.GetByUser(userID)
}
Enter fullscreen mode Exit fullscreen mode

In self-hosted mode: the app works exactly as before. No org concepts, no tenant isolation, no role checks. In saas mode: the org layer kicks in at every level — queries, writes, middleware, billing checks.

This works. But it means two mental models running in the same codebase. Every time I add a new feature, I have to ask:

  • Is this org_id-scoped or user_id-scoped?
  • Does the self-hosted path need this feature at all?
  • If it only makes sense in SaaS mode, am I guarding it properly?
  • Am I going to introduce a regression in the mode I'm not actively testing right now?

The database now has columns that are nullable in self-hosted mode and non-nullable in SaaS mode. Some indexes only get used in one mode. Some foreign keys only matter in one mode. The schema is a bifurcated thing — half of it answering one set of questions, half answering another.

Was there a better way? Probably. Did I have the time for it? No. Am I paying for it in confusion tax every week? Yes.

The cleaner alternative — which I couldn't afford to build at the time — would have been separate database views per mode, with the shared physical tables underneath:

-- Views that self-hosted queries hit (no org concept at all)
CREATE VIEW v_campaigns_selfhosted AS
    SELECT id, user_id, name, status, scheduled_at, created_at
    FROM campaigns
    WHERE org_id IS NULL;

-- Views that SaaS queries hit (org-scoped, always filtered)
CREATE VIEW v_campaigns_saas AS
    SELECT id, org_id, user_id, name, status, scheduled_at, created_at
    FROM campaigns
    WHERE org_id IS NOT NULL;
Enter fullscreen mode Exit fullscreen mode

The repository layer picks a view based on mode at initialization time. The physical schema stays unified, but each access pattern sees a coherent, mode-appropriate view of the data. No if config.IsSaaS() scattered through 200 service functions.

Maybe I'll refactor toward this eventually. Maybe I won't. Right now the mode flag is load-bearing and touching it feels dangerous.


The Specific Night I Understood What "Not Designed For This" Means

There was one night — I remember it clearly — where I was trying to add campaign-level quota limits for the plan system. Agencies on a Basic plan get 5 active campaigns per month. Pro plan gets 25. Agency plan gets unlimited.

Simple feature. Should take an afternoon.

I opened the campaigns table. The user_id foreign key was there. The org_id I'd added was there. But the plan limits needed to live at the org level, and campaigns were being created with only a user_id check in the service layer.

To enforce the limit correctly, the write path needed to:

  1. Check the org's current plan
  2. Count active campaigns across all users in that org — not just the user making the request
  3. Reject the write if over quota
  4. Do all of this atomically, without a race condition where two users hit the limit simultaneously and both sneak through
func (s *CampaignService) CreateCampaign(
    ctx context.Context,
    input CreateCampaignInput,
) (*Campaign, error) {
    orgID := ctx.Value("org_id").(uint)

    return s.repo.WithTransaction(func(tx *gorm.DB) (*Campaign, error) {
        // Lock the org row — prevents concurrent writes from bypassing the limit
        // Two users in the same org hitting this simultaneously will serialize here
        var org Organization
        if err := tx.Clauses(clause.Locking{Strength: "UPDATE"}).
            First(&org, orgID).Error; err != nil {
            return nil, fmt.Errorf("fetching org: %w", err)
        }

        // Count active campaigns across the ENTIRE org, not just this user
        var activeCampaignCount int64
        if err := tx.Model(&Campaign{}).
            Where("org_id = ? AND status NOT IN ('deleted', 'completed')", orgID).
            Count(&activeCampaignCount).Error; err != nil {
            return nil, fmt.Errorf("counting campaigns: %w", err)
        }

        // Check the org's plan limit
        limit := s.plans.CampaignLimit(org.PlanID)
        if limit != PlanUnlimited && activeCampaignCount >= int64(limit) {
            return nil, &QuotaExceededError{
                Resource: "campaigns",
                Current:  activeCampaignCount,
                Limit:    int64(limit),
                PlanName: org.PlanName,
            }
        }

        // All checks passed — create the campaign
        campaign := &Campaign{
            OrgID:  orgID,
            UserID: input.UserID,
            Name:   input.Name,
            Status: CampaignStatusDraft,
        }
        if err := tx.Create(campaign).Error; err != nil {
            return nil, fmt.Errorf("creating campaign: %w", err)
        }

        return campaign, nil
    })
}
Enter fullscreen mode Exit fullscreen mode

The database had none of the scaffolding for this. No aggregate table. No counter cache. No org-level write lock pattern. No plans table with limit definitions. No structured quota error types. I had to build all of it from scratch — and then I had to think through every other resource (contacts, WhatsApp accounts, API keys, message templates) and decide whether they also needed quota enforcement, and if so, whether the same pattern applied.

The original developers never needed any of that. Their users weren't organizations. Their quota was "as many as you want, it's your server."

I sat with that for a while. Not frustrated — just genuinely humbled. The database was doing exactly what it was built to do. I was the one who showed up with different requirements and expected it to just adapt.


The Billing Integration Problem I Didn't See Coming

Once you have organizations and plans, you need billing. Once you have billing, you need to keep your database and your billing provider (I'm using Razorpay, because India) synchronized. And that synchronization is harder than it sounds.

The problem: a customer can cancel their plan in Razorpay's dashboard. Razorpay sends a webhook. Your database still thinks they're on the Pro plan. For a window of time — however long it takes your webhook handler to process — you're serving a paying feature to a customer who just cancelled.

The naive approach is to update the plan_id on the org when the webhook fires. But webhooks fail. Webhooks arrive out of order. Webhooks get retried. The billing state in your database and the billing state in Razorpay can drift, and reconciling that drift requires infrastructure the original codebase had no concept of.

What I ended up building:

-- Subscription state lives in its own table, separate from the org
CREATE TABLE subscriptions (
    id                  BIGSERIAL PRIMARY KEY,
    org_id              BIGINT NOT NULL REFERENCES organizations(id),
    razorpay_sub_id     VARCHAR(255) NOT NULL UNIQUE,
    plan_id             BIGINT NOT NULL REFERENCES plans(id),
    status              VARCHAR(50) NOT NULL,
    -- 'active' | 'past_due' | 'cancelled' | 'paused'
    current_period_start TIMESTAMP NOT NULL,
    current_period_end   TIMESTAMP NOT NULL,
    cancel_at_period_end BOOLEAN NOT NULL DEFAULT FALSE,
    cancelled_at        TIMESTAMP,
    created_at          TIMESTAMP DEFAULT NOW(),
    updated_at          TIMESTAMP DEFAULT NOW()
);

-- Webhook events get stored before processing (idempotency)
CREATE TABLE billing_events (
    id              BIGSERIAL PRIMARY KEY,
    event_id        VARCHAR(255) NOT NULL UNIQUE, -- Razorpay event ID
    event_type      VARCHAR(100) NOT NULL,
    payload         JSONB NOT NULL,
    processed_at    TIMESTAMP,
    processing_error TEXT,
    created_at      TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_billing_events_event_id   ON billing_events(event_id);
CREATE INDEX idx_billing_events_processed  ON billing_events(processed_at)
    WHERE processed_at IS NULL;  -- Partial index — only unprocessed events
Enter fullscreen mode Exit fullscreen mode

The webhook handler stores the event first, then processes it:

func (h *BillingWebhookHandler) HandleEvent(c *gin.Context) {
    // 1. Verify Razorpay signature
    if !h.billing.VerifyWebhookSignature(c.Request) {
        c.Status(401)
        return
    }

    var payload razorpay.WebhookPayload
    if err := c.ShouldBindJSON(&payload); err != nil {
        c.Status(400)
        return
    }

    // 2. Store event for idempotent processing
    // If this event_id already exists, INSERT OR IGNORE — Razorpay retries
    event := &BillingEvent{
        EventID:   payload.EventID,
        EventType: payload.EventType,
        Payload:   payload,
    }
    if err := h.repo.StoreEventIdempotent(event); err != nil {
        // Already processed — return 200 so Razorpay stops retrying
        c.Status(200)
        return
    }

    // 3. Acknowledge immediately — process async
    c.Status(200)

    // 4. Process in background goroutine
    go h.processor.ProcessEvent(event)
}
Enter fullscreen mode Exit fullscreen mode

None of this architecture existed in the original codebase because the original had no concept of subscriptions, billing, or recurring payments. It's not a gap in the original — it's a feature it never needed. But it's infrastructure I had to build from scratch, in a codebase that had no natural place for it, while also keeping the self-hosted path working without any billing dependency.


What I'm Doing Differently Now

1. Read the schema before you read the code

When you fork something or onboard onto a codebase, the schema tells you the mental model of whoever built it. Table names, nullable decisions, what's indexed and what isn't — it's all signal. Read it like a design document, not like a starting point.

Specifically, look for:

  • What is the ownership root — what table does everything else hang off of?
  • What columns are absent that you'd expect for your use case? (For me: org_id, tenant_id, plan_id, role)
  • What's nullable that shouldn't be in your use case?
  • What indexes exist — they tell you what query patterns were optimized for, which reveals what the original author expected to be hot paths

2. Before writing a migration, write your data access policy

org_id is not a solution. It's the beginning of a decision. The real work is auditing every query and every write path against a policy you've written in plain English first:

## Data Access Policy

- A user can only see data belonging to their organization
- A user from Org A can never see, modify, or be aware of data from Org B
- Admin role: can manage org members and all org resources
- Member role: can manage campaigns and contacts but not members or billing
- Viewer role: read-only access to campaigns and contacts
- Shared data (plans, languages, countries): visible to all authenticated users regardless of org
- Billing events: visible only to Owner and Admin roles
Enter fullscreen mode Exit fullscreen mode

That document becomes the spec your migration, your repository layer, your middleware, and your tests all have to implement. Without it, you're making ad-hoc decisions at every query and hoping they add up to a consistent policy.

3. Build the roles table before you think you need it

You will need it. Build it upfront. It's much cheaper to add role checking to 20 routes at the start than to retrofit it into 80 routes six months later when your first enterprise customer asks for fine-grained access control.

4. Treat billing state as its own bounded context

Don't put subscription status directly on the org table. Subscription state changes for reasons outside your control (payment failures, manual overrides in the billing dashboard, proration events). Keeping it in a dedicated table with its own event log gives you idempotency, auditability, and the ability to replay webhook history when something goes wrong — and something always goes wrong.

5. Feature flags that split schema assumptions create long-term confusion

My mode config works, but it means I can never fully reason about the schema in a single mental model. The confusion tax compounds. Every new feature, every new query, every new test — I have to think about both modes. If I were doing this again, I'd push harder for a views-based abstraction from the start.

6. Write down things that feel obvious

The "one org = one installation" assumption was never written down because it was obvious to everyone building the original tool. The moment you fork a project for a different use case, those obvious assumptions become invisible traps.

I now keep a SCHEMA_ASSUMPTIONS.md in the repo:

# Schema Assumptions

## Tenancy model
- SaaS mode: `org_id` is the isolation boundary. No query should ever cross org boundaries.
- Self-hosted mode: `org_id` is always NULL. User is the isolation boundary.
- Mixed mode: Not supported. An installation is either SaaS or self-hosted.

## Roles
- Roles are enforced at the middleware layer, not in the database.
- Database does not enforce role-based read restrictions — only application does.
- Revisit with RLS (Row-Level Security) if compliance requirements change.

## Plan limits
- Limits are enforced at write time with SELECT FOR UPDATE on the org row.
- Counter caches are NOT used — count queries are cheap enough at current scale.
- Revisit counter caches if orgs exceed ~50,000 contacts or ~10,000 campaigns.

## Billing state
- Subscription status in `subscriptions` table is the source of truth.
- `organizations.plan_id` is a denormalized cache — refreshed on subscription events.
- Never read plan limits directly from `organizations.plan_id` without checking `subscriptions.status`.

## User ownership
- A user belongs to exactly one org in SaaS mode, or no org in self-hosted mode.
- Users cannot be transferred between orgs without a manual data migration.
- Deleting an org cascades to org_members but NOT to user accounts (users persist).
Enter fullscreen mode Exit fullscreen mode

Future me thanks present me every time I open that file.


Things I Still Haven't Solved

I want to be honest: I'm not writing this from a position of having figured everything out. There are open problems I'm sitting with:

Row-Level Security in Postgres. The "right" way to enforce tenant isolation at the database level is Postgres RLS — policies on the tables themselves that filter by org_id automatically, so even a query that accidentally omits the org_id filter can't return another org's data. I haven't implemented this yet. Application-level enforcement is working and I haven't had a bug, but it's one misplaced query away from a data leak. It's on the list.

The self-hosted ↔ SaaS migration path. What happens when a self-hosted user wants to move to the managed SaaS? Currently: manual. Someone (me) exports their data, transforms it, imports it to the SaaS instance. That's not a business. I need a proper migration tool and I don't have one yet.

Soft deletes at the org level. When an org cancels and their data enters a grace period before deletion, how do I handle queries that should or shouldn't include "cancelled org" data? Right now it's a status flag and a bunch of WHERE status != 'cancelled' clauses. It's messy.


The Broader Thing

I'm a solo developer, about six months from finishing my degree, building this on top of an open-source fork in my spare hours. I'm not a staff engineer at a FAANG company writing a blog post from authority. I'm someone who ran into a wall and had to figure out what it was made of — repeatedly, across six months of slow, sometimes painful learning.

Nobody writes about the friction of forking someone else's assumptions. Everyone writes about the clean green-field system design. The reality of building something real, on a deadline, on top of someone else's excellent work, is messier and more interesting than that. It's also more instructive, because the constraints force you to understand things you'd gloss over if you were starting from scratch and could just "design it properly."

The database wasn't designed for what I wanted to build. But the tools to renegotiate that contract were always there — migrations, transactions, views, careful query auditing, explicit policy documents. I just had to stop assuming I could skip that conversation.


I'm building Whatomate — an open-source WhatsApp Business API platform — and a SaaS layer on top of it targeting Indian digital marketing agencies. If you're doing something similar, have opinions on multi-tenancy patterns in forked codebases, or have already solved the RLS or soft-delete problems I mentioned, I'd genuinely love to hear from you in the comments.

Follow for more honest write-ups from the trenches of solo SaaS building in India.

Top comments (0)