In this project, I built an end-to-end ETL pipeline using Databricks and Delta Lake,
following the Bronze–Silver–Gold architecture.
The goal was to simulate a real-world data engineering pipeline with incremental
processing, workflow orchestration, and analytics-ready datasets.
Tech Stack
- Databricks (Free Edition)
- Apache Spark (PySpark)
- Delta Lake
- SQL
- GitHub
Architecture Overview
The pipeline follows the Bronze–Silver–Gold data architecture:
- Bronze Layer: Raw data ingestion (append-only)
- Silver Layer: Cleaned and deduplicated data with incremental updates using Delta MERGE
- Gold Layer: Aggregated business metrics optimized for analytics
Architecture Overview
The pipeline follows the Bronze–Silver–Gold data architecture:
- Bronze Layer: Raw data ingestion (append-only)
- Silver Layer: Cleaned and deduplicated data with incremental updates using Delta MERGE
- Gold Layer: Aggregated business metrics optimized for analytics
Bronze Layer
The Bronze layer ingests raw CSV data into Delta tables in append mode.
This layer acts as the source of truth and allows full reprocessing if downstream
transformations fail.
Silver Layer
The Silver layer performs data cleaning and deduplication.
Incremental updates are handled using Delta Lake MERGE to ensure idempotent processing
and avoid duplicate records.
Gold Layer
The Gold layer contains aggregated business metrics such as:
- Daily sales KPIs
- Customer-level metrics
- Product-level metrics
Gold tables are rebuilt using overwrite mode to ensure consistent and deterministic results.
Workflow Orchestration
The entire pipeline is orchestrated using Databricks Workflows.
Tasks are executed in sequence from Bronze to Silver, followed by parallel Gold aggregations.
Source Code
The complete source code is available on GitHub:
https://github.com/shubhkhandare/databricks-etl-sales
Conclusion
This project helped me understand how production-style ETL pipelines are designed
using Databricks and Delta Lake, including incremental processing and workflow orchestration.
Top comments (0)