DEV Community

Rizwan Saleem
Rizwan Saleem

Posted on

Data modeling for analytics: star schemas, fact tables, and dimensional modeling

Data modeling for analytics: star schemas, fact tables, and dimensional modeling

Analytical data modeling is different from transactional data modeling. Transactional models optimize for write performance and data integrity. Analytical models optimize for query performance and ease of understanding. The star schema is the most widely adopted analytical modeling pattern.

The star schema has a central fact table surrounded by dimension tables. The fact table contains the metrics or measures you want to analyze sales amount, page views, click count. Each fact row references dimension tables through foreign keys.

Fact tables come in different types. Transaction fact tables record individual events like each sale. Periodic snapshot fact tables record the state of things at regular intervals like daily inventory levels. Accumulating snapshot fact tables track the progress of a process.

Dimension tables should be denormalized and easy to query. A product dimension might include category, subcategory, brand, and supplier all in one table. This denormalization makes queries simpler and faster no joins needed to get product category.

Slowly changing dimensions handle the fact that dimension attributes change over time. Type 1 overwrites the old value simple but loses history. Type 2 creates a new row with effective dates preserves history but grows the table. Type 3 stores limited history.

dbt has become the standard for transforming raw data into star schemas. dbt models define transformations in SQL, handle dependencies automatically, and test your data quality. It's worth investing in dbt for consistent analytical data models.

Start with a simple star schema for your most important business process. A well-designed star schema for one process is more valuable than a complex model that covers everything poorly.

-

Rizwan Saleem | https://rizwansaleem.co

Top comments (0)