DEV Community

Abdelrahman Adnan
Abdelrahman Adnan

Posted on

# Module 4 Summary - Analytics Engineering with dbt

DataEngineeringZoomcamp #dbt #AnalyticsEngineering #DataModeling


Part 1: Introduction to Analytics Engineering & dbt Fundamentals 🎯

What is Analytics Engineering?

The Evolution of Data Roles

Traditionally, there were two main roles in data:

Role Focus Skills
Data Engineer Building pipelines, infrastructure, data movement Python, Spark, Airflow, cloud services
Data Analyst Creating reports, dashboards, insights SQL, Excel, BI tools

But there was a gap! Who transforms the raw data into clean, analysis-ready tables? Enter the Analytics Engineer.

What Does an Analytics Engineer Do?

An Analytics Engineer sits between Data Engineering and Data Analytics:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Data Engineer  β”‚ ──► β”‚  Analytics Engineer  β”‚ ──► β”‚   Data Analyst  β”‚
β”‚                 β”‚     β”‚                      β”‚     β”‚                 β”‚
β”‚  β€’ Pipelines    β”‚     β”‚  β€’ Transform data    β”‚     β”‚  β€’ Dashboards   β”‚
β”‚  β€’ Infrastructureβ”‚    β”‚  β€’ Data modeling     β”‚     β”‚  β€’ Reports      β”‚
β”‚  β€’ Data movementβ”‚     β”‚  β€’ Quality tests     β”‚     β”‚  β€’ Insights     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚  β€’ Documentation     β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

Key responsibilities:

  • πŸ“Š Transform raw data into clean, modeled datasets
  • πŸ§ͺ Write tests to ensure data quality
  • πŸ“ Document everything so others can understand
  • πŸ”— Build the "T" in ELT (Extract, Load, Transform)

The Kitchen Analogy 🍳

Think of a data warehouse like a restaurant:

Restaurant Data Warehouse Who accesses it
Pantry (raw ingredients) Staging area (raw data) Data Engineers
Kitchen (cooking happens) Processing area (transformations) Analytics Engineers
Dining Hall (served dishes) Presentation area (final tables) Business users, Analysts

Raw ingredients (data) come in, get processed (transformed), and are served as polished dishes (analytics-ready tables).


What is dbt? πŸ› οΈ

dbt stands for data build tool. It's the most popular tool for analytics engineering.

The Problems dbt Solves

Before dbt, data transformation was messy:

  • ❌ SQL scripts scattered everywhere with no organization
  • ❌ No version control (changes got lost)
  • ❌ No testing (errors discovered too late)
  • ❌ No documentation (nobody knew what anything meant)
  • ❌ No environments (changes went straight to production!)

dbt brings software engineering best practices to analytics:

  • βœ… Version control - Your SQL lives in Git
  • βœ… Modularity - Reusable pieces instead of copy-paste
  • βœ… Testing - Automated data quality checks
  • βœ… Documentation - Generated from your code
  • βœ… Environments - Separate dev and prod

How dbt Works

dbt follows a simple principle: write SQL, dbt handles the rest.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Your dbt Project                        β”‚
β”‚                                                             β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚   β”‚  models/*.sql │───►│   dbt compile │───►│ SQL Queriesβ”‚ β”‚
β”‚   β”‚  (your logic) β”‚    β”‚   dbt run     β”‚    β”‚ (executed) β”‚ β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                              β”‚                              β”‚
β”‚                              β–Ό                              β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                     β”‚
β”‚                    β”‚  Data Warehouse  β”‚                     β”‚
β”‚                    β”‚  (views/tables)  β”‚                     β”‚
β”‚                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode
  1. You write SQL files (called "models")
  2. dbt compiles them (adds warehouse-specific syntax)
  3. dbt runs them against your data warehouse
  4. Views/tables are created automatically!

dbt Core vs dbt Cloud

Feature dbt Core dbt Cloud
Cost Free (open source) Free tier + paid plans
Where it runs Your machine/server Cloud-hosted
Setup Manual installation Browser-based IDE
Scheduling Need external tool Built-in scheduler
Best for Local development, cost savings Teams, ease of use

πŸ’‘ For this course: You can use either! Local setup uses DuckDB + dbt Core (free). Cloud setup uses BigQuery + dbt Cloud.


Top comments (0)