In modern analytics engineering, tracking data changes over time is essential. Tools like dbt (data build tool) implement this using a concept called snapshots.
Snapshots allow you to preserve historical states of data without overwriting previous records.
What Is a dbt Snapshot?
A dbt snapshot is a SQL-based mechanism that:
- Runs a query on a dataset
- Compares results to previous runs
- Stores changes as new records
This creates a full history of how data evolves over time.
The Core Idea: Type 2 Slowly Changing Dimensions
dbt snapshots implement a classic data warehousing pattern:
Type 2 Slowly Changing Dimension (SCD2)
This means:
- Each change creates a new row
- Old data is preserved
- Validity periods are tracked
Example structure:
| order_id | status | dbt_valid_from | dbt_valid_to |
|---|---|---|---|
| 1 | pending | 2024-01-01 | 2024-01-02 |
| 1 | shipped | 2024-01-02 | NULL |
Why This Matters
Snapshots solve a key analytics problem:
👉 “How did this data look last week, last month, or last year?”
Use cases include:
- Customer state tracking
- Order lifecycle analysis
- Pricing history
- Marketing attribution
Under the Hood
When dbt runs a snapshot:
- It executes a SELECT query
- Compares results with stored snapshot table
- Detects changes using configured keys
- Inserts new records for changed rows
This ensures full historical traceability.
Real-World Impact
With snapshots, analysts can:
- Reconstruct historical dashboards
- Audit data changes
- Build time-aware ML features
Top comments (0)