DEV Community

Ray
Ray

Posted on

Scaling a Self-Hosted Billing Monitor from 1 to 100 Tenants

When I built BillingWatch, I designed it for a single Stripe account. Then people started asking about multi-tenant support. Here's what the architecture looks like when you scale from 1 to 100 tenants.

The Core Multi-Tenancy Pattern

The simplest approach: row-level tenant isolation with a tenant_id on every table. BillingWatch uses FastAPI + SQLite (upgradeable to Postgres), with every query scoped to the requesting tenant:

from fastapi import FastAPI, Header, HTTPException
from sqlalchemy.orm import Session

app = FastAPI()

def get_tenant_id(x_tenant_id: str = Header(...)):
    if not x_tenant_id:
        raise HTTPException(status_code=401, detail="Missing tenant header")
    return x_tenant_id

@app.get("/anomalies")
def list_anomalies(tenant_id: str = Depends(get_tenant_id), db: Session = Depends(get_db)):
    return db.query(Anomaly).filter(Anomaly.tenant_id == tenant_id).all()
Enter fullscreen mode Exit fullscreen mode

Every webhook endpoint, every query, every dashboard call is scoped this way. It's boring and it works.

Webhook Routing at Scale

The tricky part is routing Stripe webhooks when you have 100 different Stripe accounts. The clean solution: each tenant gets their own webhook endpoint with a unique path token:

@app.post("/webhook/{tenant_token}")
async def stripe_webhook(tenant_token: str, request: Request):
    tenant = db.query(Tenant).filter(Tenant.webhook_token == tenant_token).first()
    if not tenant:
        raise HTTPException(status_code=404)

    # Verify Stripe signature against THIS tenant's secret
    payload = await request.body()
    sig = request.headers.get("stripe-signature")
    try:
        event = stripe.Webhook.construct_event(payload, sig, tenant.webhook_secret)
    except stripe.error.SignatureVerificationError:
        raise HTTPException(status_code=400)

    process_event(event, tenant.id)
    return {"status": "ok"}
Enter fullscreen mode Exit fullscreen mode

This means each tenant registers https://yourdomain.com/webhook/<unique-token> in their Stripe dashboard. Clean separation, no shared secrets.

Deduplication: Don't Process Twice

Stripe retries webhooks on failure. At 100 tenants you'll hit duplicate events. A simple idempotency table:

class ProcessedEvent(Base):
    __tablename__ = "processed_events"
    event_id = Column(String, primary_key=True)
    tenant_id = Column(String, nullable=False)
    processed_at = Column(DateTime, default=datetime.utcnow)

def process_event(event, tenant_id):
    existing = db.query(ProcessedEvent).filter_by(
        event_id=event.id, tenant_id=tenant_id
    ).first()
    if existing:
        return  # already processed

    # ... process the event ...

    db.add(ProcessedEvent(event_id=event.id, tenant_id=tenant_id))
    db.commit()
Enter fullscreen mode Exit fullscreen mode

Add a composite index on (event_id, tenant_id) and this query is fast even at millions of rows.

Database Considerations

For 100 tenants with moderate webhook volume, SQLite holds up fine if you enable WAL mode:

@event.listens_for(Engine, "connect")
def set_sqlite_pragma(dbapi_connection, connection_record):
    cursor = dbapi_connection.cursor()
    cursor.execute("PRAGMA journal_mode=WAL")
    cursor.execute("PRAGMA synchronous=NORMAL")
    cursor.close()
Enter fullscreen mode Exit fullscreen mode

For higher scale, the row-level isolation pattern migrates cleanly to Postgres — the application code doesn't change, just the connection string.

Rate Limiting Per Tenant

You don't want one tenant flooding your webhook processor. A simple in-memory rate limiter per tenant_id:

from collections import defaultdict
from time import time

request_counts = defaultdict(list)

def check_rate_limit(tenant_id: str, limit=100, window=60):
    now = time()
    request_counts[tenant_id] = [t for t in request_counts[tenant_id] if now - t < window]
    if len(request_counts[tenant_id]) >= limit:
        raise HTTPException(status_code=429, detail="Rate limit exceeded")
    request_counts[tenant_id].append(now)
Enter fullscreen mode Exit fullscreen mode

What I'd Do Differently

  1. Start with row-level isolation from day one — retrofitting is painful
  2. Use unique webhook tokens immediately, not shared secrets
  3. Add the deduplication table before you need it — much easier than adding later

The full BillingWatch source, including the multi-tenant webhook handling, is at github.com/rmbell09-lang/billingwatch.

Anything you'd handle differently at scale? I'm particularly curious about Postgres schema-per-tenant vs row-level at 1000+ tenants.

Top comments (0)