Data Modeling Patterns
A comprehensive collection of battle-tested data modeling patterns for relational and NoSQL databases. Whether you're designing a greenfield schema or refactoring a legacy monolith, this kit gives you proven structures for normalization, denormalization, temporal data, polymorphic associations, and schema versioning — with runnable SQL and MongoDB examples you can adapt in minutes.
Key Features
- 12 relational patterns covering 1NF through 6NF with trade-off analysis for each normal form
- 8 NoSQL document patterns including bucket, outlier, subset, and computed patterns for MongoDB
- Temporal data modeling with SCD Type 1, Type 2, and bi-temporal table designs with trigger-based automation
- Polymorphic association strategies — single-table inheritance, class-table inheritance, and EAV with performance benchmarks
- Schema versioning framework using migration-tracked DDL with rollback safety and version metadata tables
- Anti-pattern catalog identifying 15 common modeling mistakes with before/after fixes
- Decision flowcharts to choose the right pattern based on read/write ratio, data volume, and query access patterns
- Cross-database translations showing equivalent patterns in PostgreSQL, MySQL, and MongoDB side by side
Quick Start
Extract the archive and explore the patterns by category:
unzip data-modeling-patterns.zip
cd data-modeling-patterns/
# Browse pattern categories
ls src/relational/
# normalization.sql temporal.sql polymorphic.sql hierarchical.sql
ls src/nosql/
# document_patterns.js bucket_pattern.js computed_pattern.js
Apply the temporal SCD Type 2 pattern to your PostgreSQL database:
-- Slowly Changing Dimension Type 2: full history tracking
CREATE TABLE customer_history (
surrogate_id BIGSERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL,
name VARCHAR(200) NOT NULL,
email VARCHAR(254) NOT NULL,
tier VARCHAR(20) DEFAULT 'standard',
valid_from TIMESTAMPTZ NOT NULL DEFAULT now(),
valid_to TIMESTAMPTZ DEFAULT '9999-12-31'::timestamptz,
is_current BOOLEAN DEFAULT TRUE,
version INTEGER DEFAULT 1
);
CREATE INDEX idx_customer_current ON customer_history (customer_id)
WHERE is_current = TRUE;
-- Trigger to auto-expire previous version on UPDATE
CREATE OR REPLACE FUNCTION expire_customer_version()
RETURNS TRIGGER AS $$
BEGIN
UPDATE customer_history
SET valid_to = now(), is_current = FALSE
WHERE customer_id = NEW.customer_id AND is_current = TRUE;
NEW.valid_from := now();
NEW.is_current := TRUE;
NEW.version := (
SELECT COALESCE(MAX(version), 0) + 1
FROM customer_history WHERE customer_id = NEW.customer_id
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
Architecture / How It Works
The patterns are organized into three tiers:
data-modeling-patterns/
├── src/
│ ├── relational/ # SQL patterns for PostgreSQL & MySQL
│ │ ├── normalization.sql
│ │ ├── temporal.sql
│ │ ├── polymorphic.sql
│ │ └── hierarchical.sql
│ ├── nosql/ # MongoDB document design patterns
│ │ ├── document_patterns.js
│ │ ├── bucket_pattern.js
│ │ └── computed_pattern.js
│ └── versioning/ # Schema version tracking
│ ├── version_table.sql
│ └── migration_log.sql
├── examples/ # Complete worked examples
│ ├── ecommerce_schema.sql
│ ├── analytics_star_schema.sql
│ └── multi_tenant.sql
├── docs/
│ ├── checklists/pre-deployment.md
│ └── decision_flowchart.md
└── config.example.yaml
Each pattern file contains: the DDL, sample DML, performance notes, and a "when to use / when to avoid" comment block.
Usage Examples
Star schema for analytics:
-- Fact table with foreign keys to dimensions
CREATE TABLE fact_orders (
order_key BIGSERIAL PRIMARY KEY,
date_key INTEGER REFERENCES dim_date(date_key),
customer_key INTEGER REFERENCES dim_customer(customer_key),
product_key INTEGER REFERENCES dim_product(product_key),
quantity INTEGER NOT NULL,
unit_price NUMERIC(10,2) NOT NULL,
total_amount NUMERIC(12,2) GENERATED ALWAYS AS (quantity * unit_price) STORED
);
-- Partitioned by month for query performance
CREATE TABLE fact_orders_partitioned (
LIKE fact_orders INCLUDING ALL
) PARTITION BY RANGE (date_key);
MongoDB embedded vs referenced pattern:
// Embedded pattern — best when child data is always read with parent
db.orders.insertOne({
_id: ObjectId(),
customer: { name: "Jane Doe", email: "user@example.com" },
items: [
{ sku: "WIDGET-001", qty: 3, price: 29.99 },
{ sku: "GADGET-042", qty: 1, price: 149.00 }
],
total: 238.97,
created_at: new Date()
});
// Referenced pattern — best when child data is shared across parents
db.orders.insertOne({
customer_id: ObjectId("64a1f..."),
item_ids: [ObjectId("64b2a..."), ObjectId("64b2b...")],
total: 238.97
});
Configuration
# config.example.yaml
database:
engine: postgresql # postgresql | mysql | mongodb
schema: public
naming_convention: snake_case # snake_case | camelCase
modeling:
default_id_type: bigserial # bigserial | uuid | ulid
timestamp_columns: true # auto-add created_at / updated_at
soft_delete: true # add deleted_at column to all tables
audit_columns: true # add created_by / updated_by
versioning:
track_ddl_changes: true
version_table: schema_versions
Best Practices
- Start normalized, denormalize with evidence. Premature denormalization creates update anomalies that are expensive to fix later.
- Use surrogate keys for dimension tables but keep natural keys as unique constraints for data quality.
- Partition fact tables by time when row counts exceed 50M — the query planner will thank you.
-
Version your schema from day one. A
schema_versionstable costs nothing and saves hours during incident response. - Document cardinality assumptions in comments. "1:many" today can silently become "many:many" as requirements evolve.
-
Prefer
TIMESTAMPTZoverTIMESTAMPin PostgreSQL. Timezone-naive timestamps cause subtle bugs in distributed systems.
Troubleshooting
| Problem | Cause | Fix |
|---|---|---|
| Slow JOINs on large tables | Missing foreign key indexes | Add CREATE INDEX on all FK columns — PostgreSQL does NOT auto-index FKs |
| SCD Type 2 duplicates | Trigger not expiring old rows | Verify the WHERE is_current = TRUE clause in the expiry trigger |
| MongoDB reads getting slower | Unbounded embedded arrays | Switch to the bucket pattern — cap arrays at ~500 documents |
| Schema version drift between envs | Manual DDL changes in production | Enforce all DDL through the migration framework; block direct ALTER TABLE
|
This is 1 of 9 resources in the Database Admin Pro toolkit. Get the complete [Data Modeling Patterns] with all files, templates, and documentation for $29.
Or grab the entire Database Admin Pro bundle (9 products) for $109 — save 30%.
Top comments (0)