A story about discovering the right materialization for the right use case
The Problem: "Will My Data Be Stale in 90 Days?"
Picture this: You're building a daily monitoring dashboard for ridepooling stop usage. Your stakeholder asks a simple question: "What happens if I query this data in 90 days? Will I get stale data?"
That question led me down a rabbit hole that taught me more about dbt materializations than any tutorial ever could.
The Discovery: Everything Goes Stale
My initial stop_network_usage
model was materialized as a table with a hard-coded 30-day lookback window:
{{ config(materialized="table") }}
select
stops.stop_id,
date_trunc('day', trips.occurred_at) as usage_date,
count(trips.trip_id) as daily_usage
from stops
inner join trips on [...]
where trips.occurred_at >= current_date - interval '30' day
The harsh reality hit: without regular dbt runs, every table in our data warehouse goes stale.
- Fresh events flow into source systems continuously
- dbt tables only update when manually executed
- Queries read stale data until the next dbt run
If I didn't run dbt for 90 days, my dashboard would show data from 120 days ago (90 days since last run + 30-day lookback window). Ouch.
The "Complex" Solution Path
My first instinct was to over-engineer:
- Convert to incremental materialization ✓
- Add scheduling infrastructure (cron jobs, Airflow, AWS EventBridge)
- Implement monitoring and alerting
- Add sliding window logic for data retention
- Set up error handling and recovery procedures
I started down this path, converting my model to incremental:
{{ config(materialized="incremental", unique_key=["stop_id", "usage_date"]) }}
-- Complex incremental logic with is_incremental() conditions
{% if is_incremental() %}
and date_trunc('day', trips.occurred_at) > (select max(usage_date) from {{ this }})
{% else %}
and trips.occurred_at >= current_date - interval '90' day
{% endif %}
While learning about incremental models was valuable (they're genuinely powerful for large datasets), I realized I was building a complex solution for a simple problem.
The Simple Solution: Views to the Rescue
Then it hit me. For my use case, I had a much simpler option: views.
{{ config(materialized="view") }}
select
stops.stop_id,
stops.service_area_uuid,
stops.name_en as stop_name_en,
date(trip_usage.occurred_at) as trip_date,
count(distinct trip_usage.trip_id) as daily_trip_count
from stops
left join (
select link__stop_id__pickup as stop_id, trip_id, occurred_at
from trips where link__stop_id__pickup is not null
union
select link__stop_id__delivery as stop_id, trip_id, occurred_at
from trips where link__stop_id__delivery is not null
) trip_usage on stops.stop_id = trip_usage.stop_id
group by [...]
Problem solved. Views query the source data directly every time, so they're always fresh—no automation required.
Views vs. Tables vs. Incremental: Choose Your Fighter
This experience taught me that dbt materializations aren't just technical choices—they're architectural decisions with real business implications:
📊 Views: Always Fresh, Simple
- Best for: Small-medium datasets, infrequent queries, freshness > performance
- Pros: Always fresh data, no scheduling complexity
- Cons: Slower queries, higher compute costs
- My use case: ✅ Perfect for daily dashboard monitoring
🗄️ Tables: Fast Performance, Can Go Stale
- Best for: Large datasets, frequent queries, acceptable staleness
- Pros: Pre-computed results, fast queries
- Cons: Requires regular refresh, can go stale
- My use case: ❌ Would need complex automation
⚡ Incremental: Best of Both Worlds (When You Need It)
- Best for: Very large datasets, regular updates, historical accumulation
- Pros: Efficient processing, fast subsequent runs, accumulates history
- Cons: Complex logic, still needs scheduling, overkill for simple use cases
- My use case: ❌ Over-engineered for my requirements
The Key Insight: Match the Tool to the Problem
The temptation in data engineering is to always reach for the most sophisticated solution. Incremental models are powerful, automated pipelines are impressive, but sometimes the simple answer is the right answer.
My requirements:
- Daily UI monitoring ✓
- Always fresh data ✓
- Reasonable data volumes ✓
- Simple aggregations ✓
Views delivered exactly what I needed without any of the operational overhead.
Lessons Learned
- Question your assumptions: Not every data model needs to be a table or incremental
- Understand the trade-offs: Performance vs. freshness vs. complexity
- Size your solution to your problem: Views are underrated for the right use cases
- Think about operations: Who will maintain your complex automation?
The Bottom Line
Sometimes the best engineering decision is choosing the simplest solution that meets your requirements. Views might not be glamorous, but they solved my stale data problem elegantly.
The next time someone asks "Will my data be stale in 90 days?", I'll confidently answer: "Not if you use views."
What materialization strategies have you found work best for your use cases? I'd love to hear about times when simple solutions beat complex ones in your data projects.
Top comments (0)