Thesius Code

Posted on Mar 23 • Originally published at datanest-stores.pages.dev

Data Modeling Patterns

#database #sql #postgres #performance

Data Modeling Patterns

A comprehensive collection of battle-tested data modeling patterns for relational and NoSQL databases. Whether you're designing a greenfield schema or refactoring a legacy monolith, this kit gives you proven structures for normalization, denormalization, temporal data, polymorphic associations, and schema versioning — with runnable SQL and MongoDB examples you can adapt in minutes.

Key Features

12 relational patterns covering 1NF through 6NF with trade-off analysis for each normal form
8 NoSQL document patterns including bucket, outlier, subset, and computed patterns for MongoDB
Temporal data modeling with SCD Type 1, Type 2, and bi-temporal table designs with trigger-based automation
Polymorphic association strategies — single-table inheritance, class-table inheritance, and EAV with performance benchmarks
Schema versioning framework using migration-tracked DDL with rollback safety and version metadata tables
Anti-pattern catalog identifying 15 common modeling mistakes with before/after fixes
Decision flowcharts to choose the right pattern based on read/write ratio, data volume, and query access patterns
Cross-database translations showing equivalent patterns in PostgreSQL, MySQL, and MongoDB side by side

Quick Start

Extract the archive and explore the patterns by category:

unzip data-modeling-patterns.zip
cd data-modeling-patterns/

# Browse pattern categories
ls src/relational/
# normalization.sql  temporal.sql  polymorphic.sql  hierarchical.sql

ls src/nosql/
# document_patterns.js  bucket_pattern.js  computed_pattern.js

Apply the temporal SCD Type 2 pattern to your PostgreSQL database:

-- Slowly Changing Dimension Type 2: full history tracking
CREATE TABLE customer_history (
    surrogate_id    BIGSERIAL PRIMARY KEY,
    customer_id     INTEGER NOT NULL,
    name            VARCHAR(200) NOT NULL,
    email           VARCHAR(254) NOT NULL,
    tier            VARCHAR(20) DEFAULT 'standard',
    valid_from      TIMESTAMPTZ NOT NULL DEFAULT now(),
    valid_to        TIMESTAMPTZ DEFAULT '9999-12-31'::timestamptz,
    is_current      BOOLEAN DEFAULT TRUE,
    version         INTEGER DEFAULT 1
);

CREATE INDEX idx_customer_current ON customer_history (customer_id)
    WHERE is_current = TRUE;

-- Trigger to auto-expire previous version on UPDATE
CREATE OR REPLACE FUNCTION expire_customer_version()
RETURNS TRIGGER AS $$
BEGIN
    UPDATE customer_history
    SET valid_to = now(), is_current = FALSE
    WHERE customer_id = NEW.customer_id AND is_current = TRUE;
    NEW.valid_from := now();
    NEW.is_current := TRUE;
    NEW.version := (
        SELECT COALESCE(MAX(version), 0) + 1
        FROM customer_history WHERE customer_id = NEW.customer_id
    );
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Architecture / How It Works

The patterns are organized into three tiers:

data-modeling-patterns/
├── src/
│   ├── relational/          # SQL patterns for PostgreSQL & MySQL
│   │   ├── normalization.sql
│   │   ├── temporal.sql
│   │   ├── polymorphic.sql
│   │   └── hierarchical.sql
│   ├── nosql/               # MongoDB document design patterns
│   │   ├── document_patterns.js
│   │   ├── bucket_pattern.js
│   │   └── computed_pattern.js
│   └── versioning/          # Schema version tracking
│       ├── version_table.sql
│       └── migration_log.sql
├── examples/                # Complete worked examples
│   ├── ecommerce_schema.sql
│   ├── analytics_star_schema.sql
│   └── multi_tenant.sql
├── docs/
│   ├── checklists/pre-deployment.md
│   └── decision_flowchart.md
└── config.example.yaml

Each pattern file contains: the DDL, sample DML, performance notes, and a "when to use / when to avoid" comment block.

Usage Examples

Star schema for analytics:

-- Fact table with foreign keys to dimensions
CREATE TABLE fact_orders (
    order_key       BIGSERIAL PRIMARY KEY,
    date_key        INTEGER REFERENCES dim_date(date_key),
    customer_key    INTEGER REFERENCES dim_customer(customer_key),
    product_key     INTEGER REFERENCES dim_product(product_key),
    quantity        INTEGER NOT NULL,
    unit_price      NUMERIC(10,2) NOT NULL,
    total_amount    NUMERIC(12,2) GENERATED ALWAYS AS (quantity * unit_price) STORED
);

-- Partitioned by month for query performance
CREATE TABLE fact_orders_partitioned (
    LIKE fact_orders INCLUDING ALL
) PARTITION BY RANGE (date_key);

MongoDB embedded vs referenced pattern:

// Embedded pattern — best when child data is always read with parent
db.orders.insertOne({
  _id: ObjectId(),
  customer: { name: "Jane Doe", email: "user@example.com" },
  items: [
    { sku: "WIDGET-001", qty: 3, price: 29.99 },
    { sku: "GADGET-042", qty: 1, price: 149.00 }
  ],
  total: 238.97,
  created_at: new Date()
});

// Referenced pattern — best when child data is shared across parents
db.orders.insertOne({
  customer_id: ObjectId("64a1f..."),
  item_ids: [ObjectId("64b2a..."), ObjectId("64b2b...")],
  total: 238.97
});

Configuration

# config.example.yaml
database:
  engine: postgresql        # postgresql | mysql | mongodb
  schema: public
  naming_convention: snake_case  # snake_case | camelCase

modeling:
  default_id_type: bigserial     # bigserial | uuid | ulid
  timestamp_columns: true        # auto-add created_at / updated_at
  soft_delete: true              # add deleted_at column to all tables
  audit_columns: true            # add created_by / updated_by

versioning:
  track_ddl_changes: true
  version_table: schema_versions

Best Practices

Start normalized, denormalize with evidence. Premature denormalization creates update anomalies that are expensive to fix later.
Use surrogate keys for dimension tables but keep natural keys as unique constraints for data quality.
Partition fact tables by time when row counts exceed 50M — the query planner will thank you.
Version your schema from day one. A schema_versions table costs nothing and saves hours during incident response.
Document cardinality assumptions in comments. "1:many" today can silently become "many:many" as requirements evolve.
Prefer TIMESTAMPTZ over TIMESTAMP in PostgreSQL. Timezone-naive timestamps cause subtle bugs in distributed systems.

Troubleshooting

Problem	Cause	Fix
Slow JOINs on large tables	Missing foreign key indexes	Add `CREATE INDEX` on all FK columns — PostgreSQL does NOT auto-index FKs
SCD Type 2 duplicates	Trigger not expiring old rows	Verify the `WHERE is_current = TRUE` clause in the expiry trigger
MongoDB reads getting slower	Unbounded embedded arrays	Switch to the bucket pattern — cap arrays at ~500 documents
Schema version drift between envs	Manual DDL changes in production	Enforce all DDL through the migration framework; block direct `ALTER TABLE`

This is 1 of 9 resources in the Database Admin Pro toolkit. Get the complete [Data Modeling Patterns] with all files, templates, and documentation for $29.

Get the Full Kit →

Or grab the entire Database Admin Pro bundle (9 products) for $109 — save 30%.

Get the Complete Bundle →

DEV Community

Data Modeling Patterns

Data Modeling Patterns

Key Features

Quick Start

Architecture / How It Works

Usage Examples

Configuration

Best Practices

Troubleshooting

Related Articles

Top comments (0)