DEV Community

Thesius Code
Thesius Code

Posted on • Originally published at datanest-stores.pages.dev

Data Modeling Patterns

Data Modeling Patterns

A comprehensive collection of battle-tested data modeling patterns for relational and NoSQL databases. Whether you're designing a greenfield schema or refactoring a legacy monolith, this kit gives you proven structures for normalization, denormalization, temporal data, polymorphic associations, and schema versioning — with runnable SQL and MongoDB examples you can adapt in minutes.

Key Features

  • 12 relational patterns covering 1NF through 6NF with trade-off analysis for each normal form
  • 8 NoSQL document patterns including bucket, outlier, subset, and computed patterns for MongoDB
  • Temporal data modeling with SCD Type 1, Type 2, and bi-temporal table designs with trigger-based automation
  • Polymorphic association strategies — single-table inheritance, class-table inheritance, and EAV with performance benchmarks
  • Schema versioning framework using migration-tracked DDL with rollback safety and version metadata tables
  • Anti-pattern catalog identifying 15 common modeling mistakes with before/after fixes
  • Decision flowcharts to choose the right pattern based on read/write ratio, data volume, and query access patterns
  • Cross-database translations showing equivalent patterns in PostgreSQL, MySQL, and MongoDB side by side

Quick Start

Extract the archive and explore the patterns by category:

unzip data-modeling-patterns.zip
cd data-modeling-patterns/

# Browse pattern categories
ls src/relational/
# normalization.sql  temporal.sql  polymorphic.sql  hierarchical.sql

ls src/nosql/
# document_patterns.js  bucket_pattern.js  computed_pattern.js
Enter fullscreen mode Exit fullscreen mode

Apply the temporal SCD Type 2 pattern to your PostgreSQL database:

-- Slowly Changing Dimension Type 2: full history tracking
CREATE TABLE customer_history (
    surrogate_id    BIGSERIAL PRIMARY KEY,
    customer_id     INTEGER NOT NULL,
    name            VARCHAR(200) NOT NULL,
    email           VARCHAR(254) NOT NULL,
    tier            VARCHAR(20) DEFAULT 'standard',
    valid_from      TIMESTAMPTZ NOT NULL DEFAULT now(),
    valid_to        TIMESTAMPTZ DEFAULT '9999-12-31'::timestamptz,
    is_current      BOOLEAN DEFAULT TRUE,
    version         INTEGER DEFAULT 1
);

CREATE INDEX idx_customer_current ON customer_history (customer_id)
    WHERE is_current = TRUE;

-- Trigger to auto-expire previous version on UPDATE
CREATE OR REPLACE FUNCTION expire_customer_version()
RETURNS TRIGGER AS $$
BEGIN
    UPDATE customer_history
    SET valid_to = now(), is_current = FALSE
    WHERE customer_id = NEW.customer_id AND is_current = TRUE;
    NEW.valid_from := now();
    NEW.is_current := TRUE;
    NEW.version := (
        SELECT COALESCE(MAX(version), 0) + 1
        FROM customer_history WHERE customer_id = NEW.customer_id
    );
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
Enter fullscreen mode Exit fullscreen mode

Architecture / How It Works

The patterns are organized into three tiers:

data-modeling-patterns/
├── src/
│   ├── relational/          # SQL patterns for PostgreSQL & MySQL
│   │   ├── normalization.sql
│   │   ├── temporal.sql
│   │   ├── polymorphic.sql
│   │   └── hierarchical.sql
│   ├── nosql/               # MongoDB document design patterns
│   │   ├── document_patterns.js
│   │   ├── bucket_pattern.js
│   │   └── computed_pattern.js
│   └── versioning/          # Schema version tracking
│       ├── version_table.sql
│       └── migration_log.sql
├── examples/                # Complete worked examples
│   ├── ecommerce_schema.sql
│   ├── analytics_star_schema.sql
│   └── multi_tenant.sql
├── docs/
│   ├── checklists/pre-deployment.md
│   └── decision_flowchart.md
└── config.example.yaml
Enter fullscreen mode Exit fullscreen mode

Each pattern file contains: the DDL, sample DML, performance notes, and a "when to use / when to avoid" comment block.

Usage Examples

Star schema for analytics:

-- Fact table with foreign keys to dimensions
CREATE TABLE fact_orders (
    order_key       BIGSERIAL PRIMARY KEY,
    date_key        INTEGER REFERENCES dim_date(date_key),
    customer_key    INTEGER REFERENCES dim_customer(customer_key),
    product_key     INTEGER REFERENCES dim_product(product_key),
    quantity        INTEGER NOT NULL,
    unit_price      NUMERIC(10,2) NOT NULL,
    total_amount    NUMERIC(12,2) GENERATED ALWAYS AS (quantity * unit_price) STORED
);

-- Partitioned by month for query performance
CREATE TABLE fact_orders_partitioned (
    LIKE fact_orders INCLUDING ALL
) PARTITION BY RANGE (date_key);
Enter fullscreen mode Exit fullscreen mode

MongoDB embedded vs referenced pattern:

// Embedded pattern — best when child data is always read with parent
db.orders.insertOne({
  _id: ObjectId(),
  customer: { name: "Jane Doe", email: "user@example.com" },
  items: [
    { sku: "WIDGET-001", qty: 3, price: 29.99 },
    { sku: "GADGET-042", qty: 1, price: 149.00 }
  ],
  total: 238.97,
  created_at: new Date()
});

// Referenced pattern — best when child data is shared across parents
db.orders.insertOne({
  customer_id: ObjectId("64a1f..."),
  item_ids: [ObjectId("64b2a..."), ObjectId("64b2b...")],
  total: 238.97
});
Enter fullscreen mode Exit fullscreen mode

Configuration

# config.example.yaml
database:
  engine: postgresql        # postgresql | mysql | mongodb
  schema: public
  naming_convention: snake_case  # snake_case | camelCase

modeling:
  default_id_type: bigserial     # bigserial | uuid | ulid
  timestamp_columns: true        # auto-add created_at / updated_at
  soft_delete: true              # add deleted_at column to all tables
  audit_columns: true            # add created_by / updated_by

versioning:
  track_ddl_changes: true
  version_table: schema_versions
Enter fullscreen mode Exit fullscreen mode

Best Practices

  1. Start normalized, denormalize with evidence. Premature denormalization creates update anomalies that are expensive to fix later.
  2. Use surrogate keys for dimension tables but keep natural keys as unique constraints for data quality.
  3. Partition fact tables by time when row counts exceed 50M — the query planner will thank you.
  4. Version your schema from day one. A schema_versions table costs nothing and saves hours during incident response.
  5. Document cardinality assumptions in comments. "1:many" today can silently become "many:many" as requirements evolve.
  6. Prefer TIMESTAMPTZ over TIMESTAMP in PostgreSQL. Timezone-naive timestamps cause subtle bugs in distributed systems.

Troubleshooting

Problem Cause Fix
Slow JOINs on large tables Missing foreign key indexes Add CREATE INDEX on all FK columns — PostgreSQL does NOT auto-index FKs
SCD Type 2 duplicates Trigger not expiring old rows Verify the WHERE is_current = TRUE clause in the expiry trigger
MongoDB reads getting slower Unbounded embedded arrays Switch to the bucket pattern — cap arrays at ~500 documents
Schema version drift between envs Manual DDL changes in production Enforce all DDL through the migration framework; block direct ALTER TABLE

This is 1 of 9 resources in the Database Admin Pro toolkit. Get the complete [Data Modeling Patterns] with all files, templates, and documentation for $29.

Get the Full Kit →

Or grab the entire Database Admin Pro bundle (9 products) for $109 — save 30%.

Get the Complete Bundle →


Related Articles

Top comments (0)