DataEngineeringZoomcamp #dbt #AnalyticsEngineering #DataModeling
Part 1: Introduction to Analytics Engineering & dbt Fundamentals π―
What is Analytics Engineering?
The Evolution of Data Roles
Traditionally, there were two main roles in data:
| Role | Focus | Skills |
|---|---|---|
| Data Engineer | Building pipelines, infrastructure, data movement | Python, Spark, Airflow, cloud services |
| Data Analyst | Creating reports, dashboards, insights | SQL, Excel, BI tools |
But there was a gap! Who transforms the raw data into clean, analysis-ready tables? Enter the Analytics Engineer.
What Does an Analytics Engineer Do?
An Analytics Engineer sits between Data Engineering and Data Analytics:
βββββββββββββββββββ ββββββββββββββββββββββββ βββββββββββββββββββ
β Data Engineer β βββΊ β Analytics Engineer β βββΊ β Data Analyst β
β β β β β β
β β’ Pipelines β β β’ Transform data β β β’ Dashboards β
β β’ Infrastructureβ β β’ Data modeling β β β’ Reports β
β β’ Data movementβ β β’ Quality tests β β β’ Insights β
βββββββββββββββββββ β β’ Documentation β βββββββββββββββββββ
ββββββββββββββββββββββββ
Key responsibilities:
- π Transform raw data into clean, modeled datasets
- π§ͺ Write tests to ensure data quality
- π Document everything so others can understand
- π Build the "T" in ELT (Extract, Load, Transform)
The Kitchen Analogy π³
Think of a data warehouse like a restaurant:
| Restaurant | Data Warehouse | Who accesses it |
|---|---|---|
| Pantry (raw ingredients) | Staging area (raw data) | Data Engineers |
| Kitchen (cooking happens) | Processing area (transformations) | Analytics Engineers |
| Dining Hall (served dishes) | Presentation area (final tables) | Business users, Analysts |
Raw ingredients (data) come in, get processed (transformed), and are served as polished dishes (analytics-ready tables).
What is dbt? π οΈ
dbt stands for data build tool. It's the most popular tool for analytics engineering.
The Problems dbt Solves
Before dbt, data transformation was messy:
- β SQL scripts scattered everywhere with no organization
- β No version control (changes got lost)
- β No testing (errors discovered too late)
- β No documentation (nobody knew what anything meant)
- β No environments (changes went straight to production!)
dbt brings software engineering best practices to analytics:
- β Version control - Your SQL lives in Git
- β Modularity - Reusable pieces instead of copy-paste
- β Testing - Automated data quality checks
- β Documentation - Generated from your code
- β Environments - Separate dev and prod
How dbt Works
dbt follows a simple principle: write SQL, dbt handles the rest.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Your dbt Project β
β β
β βββββββββββββββββ βββββββββββββββββ ββββββββββββββ β
β β models/*.sql βββββΊβ dbt compile βββββΊβ SQL Queriesβ β
β β (your logic) β β dbt run β β (executed) β β
β βββββββββββββββββ βββββββββββββββββ ββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββ β
β β Data Warehouse β β
β β (views/tables) β β
β ββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- You write SQL files (called "models")
- dbt compiles them (adds warehouse-specific syntax)
- dbt runs them against your data warehouse
- Views/tables are created automatically!
dbt Core vs dbt Cloud
| Feature | dbt Core | dbt Cloud |
|---|---|---|
| Cost | Free (open source) | Free tier + paid plans |
| Where it runs | Your machine/server | Cloud-hosted |
| Setup | Manual installation | Browser-based IDE |
| Scheduling | Need external tool | Built-in scheduler |
| Best for | Local development, cost savings | Teams, ease of use |
π‘ For this course: You can use either! Local setup uses DuckDB + dbt Core (free). Cloud setup uses BigQuery + dbt Cloud.
Top comments (0)