“Emerging Data Lakehouse Design & Practice Handbook: Data Lakehouse Modeling and Naming Standards (2025)” is composed of four progressive guides, with “Model Architecture → Common Standards → Layer Standards → Naming Standards” as the main line, systematically building an evolvable, governable, and shareable modern data lakehouse.
This article is the third in the series, providing a detailed analysis of design standards across data warehouse layers, covering synchronization, storage, and quality essentials.
The final article will analyze naming standards for each warehouse layer within this framework, helping enterprises build a full-chain methodology from data ingestion to value realization. Stay tuned for the full version.
1. ODS Layer Design Standards
Synchronization Standards:
- Each source table can only be synchronized once.
- Full initialization and incremental sync logic must be clear.
- Partition storage should be based on statistical date and time.
- Target table fields missing in the source must be auto-filled.
Table Categories & Lifecycle:
1. ODS Transactional Full Tables:
- Permanent preservation for non-reproducible data;
- Logs kept per retention requirements;
- Special date data retained as needed;
- Special month data retained as needed.
2. ODS Mirror Full Tables:
- Recommended daily storage;
- Preserve historical changes;
- Latest data stored in max partition;
- Historical data retained as needed.
3. ODS Incremental Data:
- Recommended daily storage;
- With corresponding full tables → keep 14 days;
- Without corresponding full tables → keep permanently.
4. Temporary Tables in ODS ETL:
- Retain on demand;
- Keep for max 7 days;
- Recommended delete after use, regenerate if needed.
5. BDSync Non-Deduped Data:
- Stored via intermediate layer;
- Default delete after use, not recommended for retention.
Data Quality:
- Full tables must configure unique identifiers.
- Monitor empty partition data.
- Monitor enum field changes and distribution.
- Monitor ODS data volume and row count fluctuations.
- All ODS tables must have comments.
2. Common Dimension Layer Standards
1) Design Principles
- Consistency: Field names, data types, and contents must align across physical tables.
- Dimension Combination & Splitting:
- Combination Principle: Combine strongly related fields (e.g., product attributes + brand).
- Unrelated/Misc Dimensions: Put into a special combined dimension.
- Behavior Dimensions: Calculated measures used as dimensions (e.g., click ranges).
- Splitting & Redundancy: Separate into core/extended tables by importance & frequency.
2) Storage & Lifecycle Management
Daily partitioning recommended.
- Max span ≤ 4 days → keep last 7 days.
- Max span ≤ 12 days → keep last 15 days.
- Max span ≤ 30 days → keep last 33 days.
- Max span ≤ 90 days → keep last 120 days.
- Max span ≤ 180 days → keep last 240 days.
- Max span ≤ 300 days → keep last 400 days.
3. DWD Detail Layer Standards
1) Storage & Lifecycle Management
Daily partitioning recommended (same rules as above).
2) Transactional Fact Table Principles
- Design based on data application needs and business processes.
- Use event date/time as partition for scan efficiency.
- Redundant subsets help reduce IO overhead.
- Degenerate dimensions reduce join costs.
3) Periodic Snapshot Fact Table
- Each row aggregates measures in a period (day/week/month).
- Granularity is periodic, not individual events.
- Typically includes many facts at consistent granularity.
4) Accumulating Snapshot Fact Table
- Built for multiple process analysis (e.g., purchase order flows).
- Analyze time intervals between events.
- Useful for events like closure, shipping not supported by transactional tables.
4. DWS Common Aggregation Layer Standards
Data warehouse performance is a key success criterion.
Aggregation improves performance by summarizing detailed data.
1) Basic Aggregation Principles
- Consistency: Must match detailed data query results.
- Avoid single-table designs: Don’t mix multiple granularities in one table.
- Flexible granularity: Aggregate at needed dimensions.
2) Aggregation Steps
- Determine aggregation dimension (e.g., by product).
- Determine roll-up level (daily, monthly, category, etc.).
- Determine aggregation facts (e.g., sales amount, order count).
3) Common Aggregation Layer Principles
- Data reusability: Aggregated data should serve multiple users.
- No cross-domain aggregation: Stay within business domain.
-
Differentiate statistical periods: Table names should indicate periods (e.g.,
_1d
,_td
,_nd
).
🫱 Previous Articles:
(I) Principles of Data Model Architecture: Four Layers and Seven Stages, the “First Cornerstone” of Data Lake and Warehouse Modeling
(Ⅱ) A Complete Guide to Core Data Warehouse Design Standards: From Layers, Types to Lifecycle
🫱 Next: Naming Standards for Data Warehouse Layers
Top comments (0)