Part 11 - Dimensions and Fact Table 📊
This part continues from the base model and explains the mart layer in dags/air_quality_dbt/models/marts/.
Why the mart layer exists
The mart layer is where the analytics shape becomes obvious. Instead of keeping everything in one large staging table, the project splits the data into a star-schema-style layout.
That makes downstream querying simpler and more efficient.
The dimension tables
The project creates two dimensions:
-
dim_stationfrom dim_station.sql, -
dim_sensorfrom dim_sensor.sql.
These are deduplicated reference tables. Each one extracts the stable descriptive fields that are useful for analysis and dashboard filtering.
The fact table
The main analytical table is fact_air_quality, defined in fact_air_quality.sql.
This table keeps the actual readings and the relevant weather context in one place. That is why the dashboard queries can stay straightforward.
Why the star schema is useful here
The warehouse design intentionally denormalizes some location fields into the fact table. That reduces the number of joins needed for common dashboard questions while still keeping clean dimension tables for reference and filtering.
For a tutorial project, this is a solid middle ground between realism and simplicity.
Tests on the mart layer
The mart schema tests check the key constraints:
- station_id should be unique and not null in dim_station,
- sensor_id should be unique and not null in dim_sensor,
- station_id and sensor_id should not be null in fact_air_quality.
That keeps the model graph honest.
Continue
The next part moves from warehouse modeling into the visualization layer and explains how Superset is seeded with datasets, charts, and a dashboard layout.
Continue to Part 12: Superset Seeding and Dashboards.
Tag: #dataengineeringzoomcamp
Top comments (0)