A deep dive into database migration patterns, normalization strategies, and leveraging Supabase's powerful features when moving from document stores to a modern PostgreSQL platform.
TL;DR
We migrated a production system from MongoDB to Supabase:
| Metric | Value |
|---|---|
| Collections Migrated | 65 |
| Tables Created | 282 |
| Rows Migrated | ~80,000 |
| Foreign Keys | 217 |
| Indexes | 491 |
| RLS Policies | 156 |
| Migration Approach | Zero downtime |
This guide covers the complete journey—from understanding why we chose Supabase, through the technical implementation, to leveraging Supabase's unique features post-migration.
Table of Contents
- Why We Chose Supabase
- The Challenge: Document vs Relational Paradigm
- Architecture Overview
- Normalization Strategies
- Data Type Mapping
- The Transformation Engine
- Handling Complex Patterns
- Row Level Security for Multi-Tenancy
- Indexing Strategy
- Foreign Key Strategy
- Leveraging Supabase Features
- Error Handling & Data Quality
- Performance Optimizations
- Lessons Learned
- Conclusion
Why We Chose Supabase
The Business Case
After months of running MongoDB in production, several pain points emerged. We evaluated multiple options and chose Supabase for these reasons:
1. ACID Compliance with Developer Experience
Supabase gives us PostgreSQL's rock-solid transactional guarantees wrapped in a modern developer experience. The dashboard, client libraries, and tooling made the transition smoother than raw PostgreSQL.
2. Built-in Row Level Security (RLS)
Our multi-tenant SaaS needed tenant isolation. MongoDB required application-level enforcement everywhere. Supabase's RLS policies enforce isolation at the database level—impossible to bypass.
-- One policy, complete tenant isolation
CREATE POLICY tenant_isolation ON entities
USING (organization_id = auth.jwt()->>'org_id');
3. Realtime Subscriptions
We had a custom change stream implementation on MongoDB. Supabase Realtime gave us this out of the box with WebSocket support and client library integration.
4. Complex Reporting Needs
Our analytics team struggled with MongoDB aggregations. Queries that took seconds in PostgreSQL required complex multi-stage pipelines in MongoDB. With Supabase, we can write standard SQL and even use their built-in pg_graphql extension.
5. Integrated Auth
Supabase Auth integrates seamlessly with RLS. User context flows from authentication through to database policies without custom middleware.
6. Cost Optimization
MongoDB's memory-mapped storage consumed significantly more RAM. Supabase's pricing model (based on PostgreSQL) was more predictable and cost-effective for our workload.
7. Edge Functions & Storage
Moving to Supabase meant we could consolidate our file storage (previously S3) and serverless functions into one platform.
The Challenge: Document vs Relational Paradigm
Before diving into implementation, it's crucial to understand the fundamental paradigm shift we're dealing with.
MongoDB's Document Model
A typical MongoDB document looks like this:
{
"_id": "507f1f77bcf86cd799439011",
"name": "Enterprise Account",
"contacts": [
{ "email": "alice@example.com", "role": "admin" },
{ "email": "bob@example.com", "role": "viewer" }
],
"settings": {
"theme": "dark",
"notifications": {
"email": true,
"sms": false
}
},
"tags": ["premium", "enterprise", "active"],
"customFields": {
"industry": "Technology",
"employeeCount": 500
}
}
This single document contains:
- Nested objects (settings, notifications)
- Arrays of objects (contacts)
- Simple arrays (tags)
- Dynamic fields (customFields with arbitrary keys)
Supabase's Relational Model
The same data in Supabase requires multiple tables:
-- Main table
CREATE TABLE accounts (
id TEXT PRIMARY KEY,
organization_id UUID NOT NULL REFERENCES organizations(id),
name TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Enable RLS
ALTER TABLE accounts ENABLE ROW LEVEL SECURITY;
-- 1:1 relationship for nested object
CREATE TABLE account_settings (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
account_id TEXT REFERENCES accounts(id) ON DELETE CASCADE,
theme TEXT DEFAULT 'light',
UNIQUE(account_id)
);
-- Deeper nesting
CREATE TABLE account_notification_settings (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
settings_id UUID REFERENCES account_settings(id) ON DELETE CASCADE,
email BOOLEAN DEFAULT true,
sms BOOLEAN DEFAULT false,
UNIQUE(settings_id)
);
-- Array of objects
CREATE TABLE account_contacts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
account_id TEXT REFERENCES accounts(id) ON DELETE CASCADE,
email TEXT NOT NULL,
role TEXT,
ordinal INTEGER NOT NULL -- Preserves array order
);
-- Simple array
CREATE TABLE account_tags (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
account_id TEXT REFERENCES accounts(id) ON DELETE CASCADE,
tag TEXT NOT NULL,
ordinal INTEGER NOT NULL
);
-- Dynamic fields (key-value pattern)
CREATE TABLE account_custom_fields (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
account_id TEXT REFERENCES accounts(id) ON DELETE CASCADE,
field_key TEXT NOT NULL,
field_value TEXT,
value_type TEXT NOT NULL -- For type preservation
);
One document became six tables. Multiply this by 65 collections, and you understand why 65 collections became 282 tables.
Architecture Overview
Our migration architecture followed a three-phase approach:
Phase 1: Schema Creation
We used Supabase CLI migrations to manage our schema:
# Create a new migration
supabase migration new create_entities_table
# Apply migrations
supabase db push
Example migration file (supabase/migrations/01_entities.sql):
-- Create main table
CREATE TABLE IF NOT EXISTS entities (
id TEXT PRIMARY KEY,
organization_id UUID NOT NULL,
name TEXT,
status TEXT DEFAULT 'active',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Enable RLS (required for Supabase security model)
ALTER TABLE entities ENABLE ROW LEVEL SECURITY;
-- Create child table for metadata
CREATE TABLE IF NOT EXISTS entity_metadata (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
entity_id TEXT REFERENCES entities(id) ON DELETE CASCADE,
key TEXT NOT NULL,
value TEXT,
ordinal INTEGER DEFAULT 0
);
ALTER TABLE entity_metadata ENABLE ROW LEVEL SECURITY;
Phase 2: Data Migration
The bulk data migration reads from BSON backup files (not live MongoDB) to avoid impacting production. We connect directly to Supabase's PostgreSQL connection string.
Phase 3: Constraints & Security
Foreign keys, indexes, and RLS policies are created after all data is loaded. This is critical for performance.
Normalization Strategies
This section is the heart of the migration. Understanding these patterns will help you tackle any document-to-relational transformation.
First Normal Form: Eliminating Arrays
The Problem: MongoDB arrays have no equivalent in relational databases.
The Solution: Create child tables with an ordinal column to preserve order.
Before (MongoDB):
{
"_id": "123",
"name": "Widget",
"categories": ["electronics", "gadgets", "sale"]
}
After (Supabase):
-- products table
| id | name |
|-----|--------|
| 123 | Widget |
-- product_categories table
| id | product_id | category | ordinal |
|--------------------------------------|------------|-------------|---------|
| a1b2c3d4-e5f6-7890-abcd-ef1234567890 | 123 | electronics | 0 |
| b2c3d4e5-f6a7-8901-bcde-f12345678901 | 123 | gadgets | 1 |
| c3d4e5f6-a7b8-9012-cdef-123456789012 | 123 | sale | 2 |
Transformation Code Pattern:
def transform_array_field(parent_id: str, array: list, field_name: str) -> list:
"""Transform a MongoDB array into rows for a Supabase child table."""
rows = []
for ordinal, value in enumerate(array):
rows.append({
"parent_id": parent_id,
field_name: value,
"ordinal": ordinal
})
return rows
Second Normal Form: Extracting Nested Objects
The Problem: Embedded documents in MongoDB need their own tables.
The Solution: Create child tables with foreign keys. Use UNIQUE constraints for 1:1 relationships.
Naming Convention: parent_nested_field (e.g., user_profile, order_shipping_address)
Before (MongoDB):
{
"_id": "user_456",
"email": "user@example.com",
"profile": {
"bio": "Software engineer",
"avatar": "https://example.com/avatar.jpg",
"social": {
"twitter": "@user",
"github": "user"
}
}
}
After (Supabase):
-- users table
| id | email |
|----------|-------------------|
| user_456 | user@example.com |
-- user_profiles table (1:1 with UNIQUE constraint)
| id | user_id | bio | avatar |
|-----|----------|-------------------|-------------------------------|
| uuid| user_456 | Software engineer | https://example.com/avatar.jpg|
-- user_profile_social table (1:1 with profile)
| id | profile_id | twitter | github |
|-----|------------|---------|--------|
| uuid| uuid | @user | user |
Third Normal Form: Transitive Dependencies
The Problem: Multi-level nesting creates transitive dependencies.
The Solution: Each level gets its own table with references only to its immediate parent.
Example: 4-Level Deep Structure
Transformation Pattern:
def flatten_nested_structure(doc: dict, parent_id: str = None, depth: int = 0):
"""Recursively flatten nested structures into separate table rows."""
results = {}
# Extract scalar fields for current level
current_row = {k: v for k, v in doc.items()
if not isinstance(v, (dict, list))}
if parent_id:
current_row["parent_id"] = parent_id
# Process nested objects
for key, value in doc.items():
if isinstance(value, dict):
child_table = f"{current_table}_{key}"
results[child_table] = flatten_nested_structure(
value,
parent_id=current_row["id"],
depth=depth + 1
)
elif isinstance(value, list):
# Handle arrays (covered in 1NF section)
pass
return results
Handling Dynamic/Schemaless Data
The Problem: MongoDB allows arbitrary fields. Supabase requires defined columns.
The Solution: Key-value pattern with type discriminators, or JSONB columns for query-heavy dynamic data.
Option 1: Key-Value Table (Better for simple lookups)
CREATE TABLE entity_custom_fields (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
entity_id TEXT REFERENCES entities(id) ON DELETE CASCADE,
field_path TEXT NOT NULL, -- Dot notation for nested: "address.city"
field_key TEXT NOT NULL, -- Final key name
field_value TEXT, -- Stored as text
value_type TEXT NOT NULL, -- 'string', 'number', 'boolean', 'null'
ordinal INTEGER DEFAULT 0
);
-- Index for efficient lookups
CREATE INDEX idx_entity_custom_fields_lookup
ON entity_custom_fields(entity_id, field_key);
Option 2: JSONB Column (Better for complex queries)
-- Add JSONB column for flexible querying
ALTER TABLE entities ADD COLUMN custom_data JSONB DEFAULT '{}';
-- Create GIN index for JSONB queries
CREATE INDEX idx_entities_custom_data ON entities USING GIN(custom_data);
-- Query example
SELECT * FROM entities
WHERE custom_data->>'industry' = 'Technology'
AND (custom_data->>'employeeCount')::int > 100;
Transformation Code:
def transform_dynamic_fields(
parent_id: str,
fields: dict,
path_prefix: str = ""
) -> list:
"""Transform dynamic fields into key-value rows with type preservation."""
rows = []
ordinal = 0
for key, value in fields.items():
current_path = f"{path_prefix}.{key}" if path_prefix else key
if isinstance(value, dict):
# Recurse into nested objects
rows.extend(transform_dynamic_fields(parent_id, value, current_path))
else:
# Determine type
if value is None:
value_type = "null"
str_value = None
elif isinstance(value, bool):
value_type = "boolean"
str_value = str(value).lower()
elif isinstance(value, (int, float)):
value_type = "number"
str_value = str(value)
else:
value_type = "string"
str_value = str(value)
rows.append({
"entity_id": parent_id,
"field_path": current_path,
"field_key": key,
"field_value": str_value,
"value_type": value_type,
"ordinal": ordinal
})
ordinal += 1
return rows
Data Type Mapping
Understanding type conversions is critical. Here's our complete mapping for Supabase:
| MongoDB Type | Supabase Type | Conversion Notes |
|---|---|---|
ObjectId |
TEXT |
Hex string (24 chars). Consider UUID for new IDs |
String |
TEXT |
Direct mapping |
Int32 |
INTEGER |
Direct mapping |
Int64 / Long
|
BIGINT |
Direct mapping |
Double |
DOUBLE PRECISION |
Direct mapping |
Boolean |
BOOLEAN |
Direct mapping |
Date |
TIMESTAMPTZ |
Always timezone-aware in Supabase |
Null |
NULL |
Handle explicitly in code |
Array |
Child Table | With ordinal column |
Embedded Document |
Child Table | FK with CASCADE |
Binary |
BYTEA or Supabase Storage |
Large files → Storage |
Decimal128 |
NUMERIC |
For precise decimals |
Mixed/Dynamic |
JSONB or KV Table |
JSONB preferred for queries |
Special Handling: IDs
Supabase prefers UUIDs for new records. For migrated data, we preserve MongoDB ObjectIds as TEXT:
def convert_object_id(oid) -> str:
"""Convert MongoDB ObjectId to string for Supabase."""
if oid is None:
return None
return str(oid) # Returns 24-character hex string
For new tables or columns, use Supabase's UUID generation:
CREATE TABLE new_entities (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
legacy_mongo_id TEXT UNIQUE, -- Preserve old ID for reference
-- ...
);
Special Handling: Dates
Supabase uses TIMESTAMPTZ which is timezone-aware:
from datetime import datetime, timezone
def convert_date(mongo_date) -> datetime:
"""Convert MongoDB date to Supabase timestamptz."""
if mongo_date is None:
return None
if isinstance(mongo_date, datetime):
# Ensure timezone awareness
if mongo_date.tzinfo is None:
return mongo_date.replace(tzinfo=timezone.utc)
return mongo_date
# Handle milliseconds since epoch
return datetime.fromtimestamp(mongo_date / 1000, tz=timezone.utc)
Special Handling: Binary/Files
Large binary data should go to Supabase Storage instead of the database:
from supabase import create_client
supabase = create_client(SUPABASE_URL, SUPABASE_KEY)
def migrate_file(file_data: bytes, filename: str, bucket: str = "migrated-files"):
"""Upload binary data to Supabase Storage."""
path = f"legacy/{filename}"
result = supabase.storage.from_(bucket).upload(path, file_data)
# Return the public URL or path for database reference
return supabase.storage.from_(bucket).get_public_url(path)
The Transformation Engine
BSON Reader Layer
We read directly from BSON backup files rather than querying live MongoDB:
BSON Reading Implementation:
import struct
import bson
def read_bson_file(filepath: str):
"""Generator that yields documents from a BSON file."""
with open(filepath, 'rb') as f:
while True:
# Read 4-byte document length (little-endian)
length_bytes = f.read(4)
if len(length_bytes) < 4:
break # EOF
doc_length = struct.unpack('<i', length_bytes)[0]
# Read rest of document (length includes the 4 bytes we already read)
doc_body = f.read(doc_length - 4)
if len(doc_body) < doc_length - 4:
raise ValueError("Unexpected EOF in BSON file")
# Parse BSON document
full_doc = length_bytes + doc_body
yield bson.decode(full_doc)
Connecting to Supabase for Migration
For bulk migration, connect directly to Supabase's PostgreSQL:
import psycopg2
from urllib.parse import urlparse
# Use the direct database connection string from Supabase dashboard
# Settings > Database > Connection string > URI
DATABASE_URL = "postgresql://postgres.[project-ref]:[password]@aws-0-[region].pooler.supabase.com:6543/postgres"
def get_supabase_connection():
"""Get direct PostgreSQL connection to Supabase."""
return psycopg2.connect(DATABASE_URL)
Recursive Flattening Algorithm
The core transformation logic handles arbitrary nesting depth:
from dataclasses import dataclass, field
from typing import Any
import re
@dataclass
class FlattenResult:
"""Result of flattening a document."""
main_row: dict
child_tables: dict = field(default_factory=dict)
def flatten_document(
doc: dict,
table_name: str,
parent_id: str = None,
parent_key: str = None,
max_depth: int = 10,
current_depth: int = 0
) -> FlattenResult:
"""
Recursively flatten a MongoDB document into Supabase table rows.
Args:
doc: The document to flatten
table_name: Name of the current table
parent_id: ID of the parent row (for child tables)
parent_key: Foreign key column name
max_depth: Maximum nesting depth (prevents infinite recursion)
current_depth: Current recursion depth
Returns:
FlattenResult with main row and child table data
"""
if current_depth > max_depth:
raise ValueError(f"Max nesting depth exceeded: {max_depth}")
result = FlattenResult(main_row={}, child_tables={})
# Handle parent reference
if parent_id and parent_key:
result.main_row[parent_key] = parent_id
# Get document ID
doc_id = str(doc.get('_id', ''))
for key, value in doc.items():
if key == '_id':
result.main_row['id'] = str(value)
continue
if value is None:
result.main_row[to_snake_case(key)] = None
elif isinstance(value, dict):
# Nested object -> child table
child_table = f"{table_name}_{to_snake_case(key)}"
child_result = flatten_document(
value,
child_table,
parent_id=doc_id,
parent_key=f"{table_name}_id",
max_depth=max_depth,
current_depth=current_depth + 1
)
result.child_tables[child_table] = [child_result.main_row]
# Merge nested child tables
for ct_name, ct_rows in child_result.child_tables.items():
result.child_tables.setdefault(ct_name, []).extend(ct_rows)
elif isinstance(value, list):
# Array -> child table with ordinal
child_table = f"{table_name}_{to_snake_case(key)}"
result.child_tables[child_table] = []
for ordinal, item in enumerate(value):
if isinstance(item, dict):
child_result = flatten_document(
item,
child_table,
parent_id=doc_id,
parent_key=f"{table_name}_id",
max_depth=max_depth,
current_depth=current_depth + 1
)
child_result.main_row['ordinal'] = ordinal
result.child_tables[child_table].append(child_result.main_row)
# Merge nested child tables
for ct_name, ct_rows in child_result.child_tables.items():
result.child_tables.setdefault(ct_name, []).extend(ct_rows)
else:
# Simple value array
result.child_tables[child_table].append({
f"{table_name}_id": doc_id,
"value": convert_value(item),
"ordinal": ordinal
})
else:
# Scalar value
result.main_row[to_snake_case(key)] = convert_value(value)
return result
def to_snake_case(name: str) -> str:
"""Convert camelCase to snake_case."""
s1 = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
return re.sub('([a-z0-9])([A-Z])', r'\1_\2', s1).lower()
def convert_value(value: Any) -> Any:
"""Convert MongoDB values to Supabase-compatible values."""
from bson import ObjectId
from datetime import datetime
if isinstance(value, ObjectId):
return str(value)
elif isinstance(value, datetime):
return value # psycopg2 handles datetime
elif isinstance(value, bytes):
return value # BYTEA
elif isinstance(value, bool):
return value
elif isinstance(value, (int, float)):
return value
else:
return str(value) if value is not None else None
Batch Processing Strategy
Inserting one row at a time is catastrophically slow. We batch inserts with configurable thresholds:
Batch Insert Implementation:
from typing import List
from threading import Lock
class BatchInserter:
"""Efficient batch insert manager for Supabase PostgreSQL."""
def __init__(
self,
connection,
table_name: str,
columns: List[str],
batch_size: int = 1000,
on_conflict: str = "DO NOTHING"
):
self.conn = connection
self.table = table_name
self.columns = columns
self.batch_size = batch_size
self.on_conflict = on_conflict
self.buffer: List[tuple] = []
self.total_inserted = 0
self._lock = Lock()
def add(self, row: dict):
"""Add a row to the buffer, flushing if threshold reached."""
values = tuple(row.get(col) for col in self.columns)
self.buffer.append(values)
if len(self.buffer) >= self.batch_size:
self.flush()
def flush(self):
"""Execute batch insert for buffered rows."""
if not self.buffer:
return
with self._lock:
placeholders = ', '.join(['%s'] * len(self.columns))
columns_str = ', '.join(self.columns)
query = f"""
INSERT INTO {self.table} ({columns_str})
VALUES ({placeholders})
ON CONFLICT {self.on_conflict}
"""
with self.conn.cursor() as cur:
cur.executemany(query, self.buffer)
self.total_inserted += len(self.buffer)
self.conn.commit()
self.buffer.clear()
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.flush() # Flush remaining on context exit
Handling Complex Patterns
Graph Structures (Workflow Nodes/Edges)
Many applications store graph data in documents. Workflows with nodes and edges are a common example:
Supabase Schema:
CREATE TABLE workflows (
id TEXT PRIMARY KEY,
organization_id UUID NOT NULL,
name TEXT,
created_at TIMESTAMPTZ DEFAULT NOW()
);
ALTER TABLE workflows ENABLE ROW LEVEL SECURITY;
CREATE TABLE workflow_nodes (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
workflow_id TEXT REFERENCES workflows(id) ON DELETE CASCADE,
node_type TEXT,
position_x DOUBLE PRECISION,
position_y DOUBLE PRECISION,
label TEXT,
config JSONB DEFAULT '{}', -- Flexible node configuration
ordinal INTEGER
);
ALTER TABLE workflow_nodes ENABLE ROW LEVEL SECURITY;
CREATE TABLE workflow_edges (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
workflow_id TEXT REFERENCES workflows(id) ON DELETE CASCADE,
source_node_id TEXT, -- References node by original ID
target_node_id TEXT,
edge_type TEXT,
ordinal INTEGER
);
ALTER TABLE workflow_edges ENABLE ROW LEVEL SECURITY;
Multi-Tenant Architecture
Every table includes organization scoping. This is where Supabase's RLS shines (covered in next section):
-- Every table follows this pattern
CREATE TABLE entities (
id TEXT PRIMARY KEY,
organization_id UUID NOT NULL, -- Tenant scoping
-- ... other columns
);
-- RLS policy enforces tenant isolation
CREATE POLICY tenant_isolation ON entities
FOR ALL
USING (organization_id = (auth.jwt()->>'org_id')::uuid);
Audit Trail Preservation
Audit logs work beautifully with Supabase's JSONB support:
CREATE TABLE audit_logs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
organization_id UUID NOT NULL,
resource_type TEXT NOT NULL,
resource_id TEXT NOT NULL,
action TEXT NOT NULL, -- 'create', 'update', 'delete'
actor_id UUID REFERENCES auth.users(id),
actor_type TEXT, -- 'user', 'system', 'api'
timestamp TIMESTAMPTZ DEFAULT NOW(),
changes JSONB, -- Before/after diff
metadata JSONB -- Additional context
);
ALTER TABLE audit_logs ENABLE ROW LEVEL SECURITY;
-- Efficient queries
CREATE INDEX idx_audit_resource ON audit_logs(resource_type, resource_id);
CREATE INDEX idx_audit_org_time ON audit_logs(organization_id, timestamp DESC);
CREATE INDEX idx_audit_changes ON audit_logs USING GIN(changes);
Row Level Security for Multi-Tenancy
This is one of Supabase's killer features. Instead of filtering in every query, we enforce isolation at the database level.
Basic Tenant Isolation Pattern
-- Enable RLS on the table
ALTER TABLE entities ENABLE ROW LEVEL SECURITY;
-- Create policy using JWT claims
CREATE POLICY tenant_isolation ON entities
FOR ALL
USING (
organization_id = (
SELECT (auth.jwt()->>'org_id')::uuid
)
);
Policy Patterns by Operation
-- SELECT: Users see only their organization's data
CREATE POLICY select_own_org ON entities
FOR SELECT
USING (organization_id = auth.jwt()->>'org_id');
-- INSERT: New records must belong to user's organization
CREATE POLICY insert_own_org ON entities
FOR INSERT
WITH CHECK (organization_id = auth.jwt()->>'org_id');
-- UPDATE: Can only update own organization's records
CREATE POLICY update_own_org ON entities
FOR UPDATE
USING (organization_id = auth.jwt()->>'org_id')
WITH CHECK (organization_id = auth.jwt()->>'org_id');
-- DELETE: Can only delete own organization's records
CREATE POLICY delete_own_org ON entities
FOR DELETE
USING (organization_id = auth.jwt()->>'org_id');
Role-Based Access Within Tenants
-- Admins can do everything within their org
CREATE POLICY admin_full_access ON entities
FOR ALL
USING (
organization_id = auth.jwt()->>'org_id'
AND auth.jwt()->>'role' = 'admin'
);
-- Regular users can only view
CREATE POLICY user_read_only ON entities
FOR SELECT
USING (
organization_id = auth.jwt()->>'org_id'
AND auth.jwt()->>'role' = 'user'
);
Service Role Bypass for Migration
During migration, use the service role to bypass RLS:
# For migration: use service role key (bypasses RLS)
# This is the "service_role" key from Supabase dashboard
SUPABASE_SERVICE_KEY = "eyJ..."
# Service role connection bypasses RLS
# Use direct database connection with service credentials
Post-Migration: Verify RLS
After migration, verify RLS is working:
-- Check which policies exist
SELECT tablename, policyname, cmd, qual
FROM pg_policies
WHERE schemaname = 'public';
-- Test as a specific user (in Supabase SQL editor)
SET request.jwt.claims = '{"org_id": "org_123", "role": "user"}';
SELECT * FROM entities; -- Should only see org_123's data
Indexing Strategy
Proper indexing is critical for Supabase performance:
Index Categories
| Category | Example | Index Type | Notes |
|---|---|---|---|
| Primary Key | id |
B-Tree (automatic) | Use UUID for new tables |
| Foreign Key | parent_id |
B-Tree | Critical for JOINs |
| Tenant Scope | organization_id |
B-Tree | Required for RLS performance |
| Status | status |
B-Tree | Often combined with org |
| Temporal | created_at |
B-Tree DESC | For recent-first queries |
| JSONB | custom_data |
GIN | For containment queries |
| Text Search | name |
GIN (tsvector) | For full-text search |
Index Creation Script Pattern
-- Foreign key indexes (critical for JOIN performance)
CREATE INDEX IF NOT EXISTS idx_entity_items_entity_id
ON entity_items(entity_id);
-- Tenant-scoped queries (CRITICAL for RLS performance)
CREATE INDEX IF NOT EXISTS idx_entities_org
ON entities(organization_id);
-- Combined index for common query patterns
CREATE INDEX IF NOT EXISTS idx_entities_org_status
ON entities(organization_id, status);
-- Temporal queries
CREATE INDEX IF NOT EXISTS idx_entities_created
ON entities(created_at DESC);
-- JSONB containment queries
CREATE INDEX IF NOT EXISTS idx_entities_custom_data
ON entities USING GIN(custom_data);
-- JSONB specific key queries (expression index)
CREATE INDEX IF NOT EXISTS idx_entities_custom_industry
ON entities((custom_data->>'industry'));
Index Count by Category
Our final index breakdown:
| Category | Count | Percentage |
|---|---|---|
| Foreign Key | 217 | 44% |
| Tenant Scope | 89 | 18% |
| Status/Type | 72 | 15% |
| Temporal | 58 | 12% |
| Composite | 41 | 8% |
| JSONB/GIN | 14 | 3% |
| Total | 491 | 100% |
Foreign Key Strategy
Phased Constraint Creation
Critical insight: Creating foreign keys during data load is extremely slow. Each INSERT triggers constraint validation.
Our approach:
- Create tables without foreign keys
- Migrate all data (using service role)
- Clean orphaned references
- Add foreign keys
- Create indexes
- Enable and create RLS policies
Orphan Cleanup
Before adding foreign keys, clean up invalid references:
-- Find orphans: child references non-existent parent
SELECT child.id, child.parent_id
FROM child_table child
LEFT JOIN parent_table parent ON child.parent_id = parent.id
WHERE child.parent_id IS NOT NULL
AND parent.id IS NULL;
-- Clean orphans: set to NULL
UPDATE child_table child
SET parent_id = NULL
WHERE parent_id IS NOT NULL
AND NOT EXISTS (
SELECT 1 FROM parent_table
WHERE id = child.parent_id
);
Cascade Strategy Decision Matrix
| Relationship | ON DELETE | ON UPDATE | Reasoning |
|---|---|---|---|
| Strong ownership | CASCADE | CASCADE | Child meaningless without parent |
| Soft reference | SET NULL | CASCADE | Preserve child, clear reference |
| Lookup reference | RESTRICT | CASCADE | Prevent invalid state |
| Audit/history | NO ACTION | CASCADE | Never delete audit records |
Leveraging Supabase Features
After migration, you can take advantage of Supabase's full feature set.
Realtime Subscriptions
Replace MongoDB change streams with Supabase Realtime:
import { createClient } from '@supabase/supabase-js'
const supabase = createClient(SUPABASE_URL, SUPABASE_ANON_KEY)
// Subscribe to changes on a table
const subscription = supabase
.channel('entities-changes')
.on(
'postgres_changes',
{
event: '*', // INSERT, UPDATE, DELETE
schema: 'public',
table: 'entities',
filter: `organization_id=eq.${orgId}`
},
(payload) => {
console.log('Change received:', payload)
// Handle the change
}
)
.subscribe()
Enable Realtime for Tables
-- Enable realtime for specific tables
ALTER PUBLICATION supabase_realtime ADD TABLE entities;
ALTER PUBLICATION supabase_realtime ADD TABLE entity_items;
Supabase Client for Application Code
Replace MongoDB driver with Supabase client:
// Before: MongoDB
const entities = await db.collection('entities')
.find({ organizationId: orgId, status: 'active' })
.toArray()
// After: Supabase (RLS handles org filtering automatically!)
const { data: entities, error } = await supabase
.from('entities')
.select('*')
.eq('status', 'active')
Edge Functions for Complex Operations
Replace MongoDB aggregations with Edge Functions:
// supabase/functions/calculate-metrics/index.ts
import { serve } from 'https://deno.land/std@0.168.0/http/server.ts'
import { createClient } from 'https://esm.sh/@supabase/supabase-js@2'
serve(async (req) => {
const supabase = createClient(
Deno.env.get('SUPABASE_URL')!,
Deno.env.get('SUPABASE_SERVICE_ROLE_KEY')!
)
// Complex aggregation that was painful in MongoDB
const { data, error } = await supabase.rpc('calculate_entity_metrics')
return new Response(JSON.stringify(data), {
headers: { 'Content-Type': 'application/json' }
})
})
Database Functions for Complex Queries
-- Create a function for complex aggregations
CREATE OR REPLACE FUNCTION calculate_entity_metrics(org_id UUID)
RETURNS TABLE (
status TEXT,
count BIGINT,
avg_items NUMERIC
) AS $$
BEGIN
RETURN QUERY
SELECT
e.status,
COUNT(*)::BIGINT,
AVG(item_count)::NUMERIC
FROM entities e
LEFT JOIN (
SELECT entity_id, COUNT(*) as item_count
FROM entity_items
GROUP BY entity_id
) items ON e.id = items.entity_id
WHERE e.organization_id = org_id
GROUP BY e.status;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
Storage Integration
Move binary data to Supabase Storage:
// Upload file
const { data, error } = await supabase.storage
.from('documents')
.upload(`${orgId}/${entityId}/${filename}`, file)
// Get public URL
const { data: { publicUrl } } = supabase.storage
.from('documents')
.getPublicUrl(`${orgId}/${entityId}/${filename}`)
// Store URL in database
await supabase
.from('entity_attachments')
.insert({ entity_id: entityId, file_url: publicUrl })
Error Handling & Data Quality
Graceful Degradation
A single bad document should not stop the entire migration:
import logging
from dataclasses import dataclass
from threading import Lock
@dataclass
class MigrationStats:
"""Thread-safe migration statistics."""
documents_processed: int = 0
documents_failed: int = 0
rows_inserted: int = 0
_lock: Lock = None
def __post_init__(self):
self._lock = Lock()
def record_success(self, rows: int):
with self._lock:
self.documents_processed += 1
self.rows_inserted += rows
def record_failure(self):
with self._lock:
self.documents_processed += 1
self.documents_failed += 1
def migrate_with_error_handling(bson_path: str, db_connection) -> MigrationStats:
"""Migrate collection with graceful error handling."""
stats = MigrationStats()
logger = logging.getLogger(__name__)
for doc in read_bson_file(bson_path):
try:
result = flatten_document(doc, 'entities')
rows_inserted = insert_all_rows(db_connection, result)
stats.record_success(rows_inserted)
except Exception as e:
stats.record_failure()
doc_id = doc.get('_id', 'unknown')
logger.error(f"Failed to migrate document {doc_id}: {e}")
# Save failed documents for retry
save_failed_document(doc, str(e))
return stats
Data Validation
MongoDB allows messy data. Sanitize during transformation:
from typing import Optional
def sanitize_text(value: Optional[str]) -> Optional[str]:
"""Sanitize text for Supabase insertion."""
if value is None:
return None
# Remove null bytes (PostgreSQL doesn't allow \x00)
value = value.replace('\x00', '')
# Ensure valid UTF-8
value = value.encode('utf-8', errors='replace').decode('utf-8')
# Trim excessive whitespace
value = ' '.join(value.split())
return value if value else None
Performance Optimizations
Parallel Collection Processing
Collections are independent—process them in parallel:
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import List
def migrate_all_collections(
collection_paths: List[str],
db_connection_factory,
max_workers: int = 4
) -> dict:
"""Migrate multiple collections in parallel."""
results = {}
with ThreadPoolExecutor(max_workers=max_workers) as executor:
futures = {}
for path in collection_paths:
conn = db_connection_factory()
future = executor.submit(migrate_collection, path, conn)
futures[future] = path
for future in as_completed(futures):
path = futures[future]
try:
stats = future.result()
results[path] = stats
print(f"✓ {path}: {stats.documents_processed} docs")
except Exception as e:
print(f"✗ {path}: {e}")
results[path] = None
return results
Supabase Connection Pooling
Use Supabase's connection pooler for better performance:
# Use pooler connection string (port 6543) instead of direct (port 5432)
# Transaction mode for short queries
POOLER_URL = "postgresql://postgres.[ref]:[pass]@aws-0-[region].pooler.supabase.com:6543/postgres"
# Session mode for migrations (maintains state)
DIRECT_URL = "postgresql://postgres.[ref]:[pass]@db.[ref].supabase.co:5432/postgres"
Deferred Constraint Validation
Disable triggers during bulk load:
-- Before migration (run as superuser/service role)
SET session_replication_role = replica;
-- ... perform bulk insert ...
-- After migration
SET session_replication_role = DEFAULT;
Lessons Learned
1. Plan for Orphaned References
Problem: MongoDB doesn't enforce referential integrity. We found thousands of references to deleted documents.
Solution: Build orphan detection into your pipeline. Run cleanup before adding foreign keys.
2. RLS Performance Requires Indexes
Problem: RLS policies that filter by organization_id are slow without proper indexes.
Solution: Always create indexes on columns used in RLS policies BEFORE enabling policies.
-- Create index BEFORE policy
CREATE INDEX idx_entities_org ON entities(organization_id);
-- Then create policy
CREATE POLICY tenant_isolation ON entities
USING (organization_id = auth.jwt()->>'org_id');
3. Test with Production-Scale Data
Problem: A migration that works with 1,000 documents may fail with 100,000.
Solution: Test with full production data volume. Monitor memory and query performance.
4. Use Service Role for Migration
Problem: RLS policies block migration inserts if organization_id doesn't match JWT.
Solution: Use service role key during migration (bypasses RLS), then verify RLS works after.
5. Supabase CLI is Your Friend
Solution: Use Supabase CLI for:
- Schema migrations (
supabase migration new) - Local development (
supabase start) - Type generation (
supabase gen types typescript)
6. Handle Time Zones Explicitly
Problem: MongoDB dates might be in different time zones.
Solution: Always use TIMESTAMPTZ and explicitly convert to UTC during migration.
Results & Metrics
Migration Statistics
| Metric | Value |
|---|---|
| Source Collections | 65 |
| Target Tables | 282 |
| Total Rows | ~80,000 |
| Foreign Keys | 217 |
| Indexes | 491 |
| RLS Policies | 156 |
| Orphaned References Cleaned | 2,847 |
| Failed Documents | 12 |
| Success Rate | 99.98% |
Performance Comparison (Post-Migration)
| Query Type | MongoDB | Supabase | Improvement |
|---|---|---|---|
| Simple lookup | 5ms | 2ms | 60% |
| Aggregation | 250ms | 45ms | 82% |
| Join (3 tables) | N/A* | 15ms | - |
| Reporting query | 2.5s | 180ms | 93% |
| Realtime updates | Custom code | Built-in | ∞ |
*MongoDB required multiple queries and application-level joins.
Developer Experience Improvements
| Aspect | MongoDB | Supabase |
|---|---|---|
| Multi-tenant security | Application code | Database RLS |
| Realtime | Custom change streams | Built-in |
| Type safety | Manual | Generated types |
| Auth integration | Separate system | Integrated |
| Dashboard | MongoDB Compass | Supabase Studio |
Conclusion
Migrating from MongoDB to Supabase is more than a database change—it's an upgrade to a complete backend platform. The relational model brings data integrity, RLS brings security, and Supabase's ecosystem brings developer productivity.
Key Takeaways:
Plan normalization carefully. Each nested structure needs a strategy.
Embrace RLS. It's not just security—it simplifies your application code.
Phase your migration. Schema → Data → Constraints → RLS policies.
Use the full platform. Realtime, Storage, Edge Functions, and Auth are included.
Test at scale. Performance characteristics change with data volume.
Next Steps After Migration
- Validation: Compare counts, verify data integrity, test RLS policies
- Update Application: Switch to Supabase client, remove manual tenant filtering
- Enable Realtime: Add subscriptions where needed
- Migrate Files: Move binary data to Supabase Storage
- Monitor: Use Supabase dashboard for query performance
Useful Resources
This post is based on a real production migration. Numbers have been rounded, and specific table names have been generalized.
Questions about your own migration? Drop them in the comments!













Top comments (0)