DEV Community

Abdelrahman Adnan
Abdelrahman Adnan

Posted on

Part 10 - Base Model and Data Quality ✅

Part 10 - Base Model and Data Quality ✅

This part continues from the dbt setup and looks at the base layer in dags/air_quality_dbt/models/base/base_air_quality.sql and dags/air_quality_dbt/models/base/schema.yml.

The base model

The base model is a view over the staging table. It selects the core fields needed by downstream models:

  • station identifiers,
  • sensor identifiers,
  • measurement values,
  • coordinates,
  • weather context,
  • and time partitions.

This layer is the place where the project standardizes the source before turning it into marts.

Why a view makes sense here

A base view is a good fit because it avoids copying data unnecessarily while still giving dbt a clean object to reference.

That means the warehouse load handles physical persistence, and dbt handles logical modeling.

Data quality checks

The schema file adds simple but important tests:

  • station_id should not be null,
  • sensor_id should not be null,
  • target_country_name should not be null.

These tests are not complicated, but they help catch broken ingestion or malformed records early.

What this teaches

This is a good example of how data quality in dbt starts small:

  • declare the source,
  • expose a clean base model,
  • assert the essential keys,
  • and let the downstream marts depend on the trusted layer.

Continue

The next part explains the mart tables themselves and shows how the project separates stations, sensors, and the final fact table.

Continue to Part 11: Dimensions and Fact Table.

Tag: #dataengineeringzoomcamp

Top comments (0)