DEV Community

Dinesh Palli
Dinesh Palli

Posted on

Completed: End-to-End Data Engineering Project on Microsoft Azure

I'm excited to share the successful completion of my recent end-to-end data engineering project!

This project provided a comprehensive, hands-on experience in building a robust data pipeline - migrating data from an on-premises SQL Server database, implementing an automated daily ETL/ELT pipeline, and delivering insights through a Power BI dashboard based on IBCS standards using Microsoft Azure services. This covered the full data lifecycle, from source system extraction to business intelligence reporting.

Diagram illustrating an end-to-end Azure data engineering pipeline. Data originates from an On-Premises SQL Server, is ingested via Azure Data Factory, transformed through Bronze, Silver, and Gold layers in Azure Data Lake Storage Gen2 using Azure Databricks, and then loaded into Azure Synapse Analytics for visualization in Power BI. Supporting services like Azure Key Vault and Azure AD are also shown.

Project Highlights:

• Data Ingestion: Leveraged Azure Data Factory (ADF) to ingest data from an on-premises SQL database, including the setup of Self-Hosted Integration Runtimes (SHIR) for secure on-premises SQL Server connectivity and dynamic pipeline creation for table ingestion.

• Data Lake Storage & Medallion Architecture: Implemented a multi-layered data lake (bronze, silver, and gold tiers) using Azure Data Lake Storage Gen2 (ADLS Gen2) for efficient data organization and processing (Bronze: raw Parquet, Silver: cleaned Delta, Gold: aggregated Delta).

• Data Transformation: Developed PySpark notebooks within Azure Databricks for robust data cleaning, transformation, schema evolution, and business logic application between data lake layers.

• Data Warehousing & Analytics: Integrated Azure Synapse Analytics, creating serverless SQL pools and views over the Gold layer data for efficient querying and Power BI connectivity.

• Secure Credential Management: Ensured secure storage and access of credentials using Azure Key Vault.

• Business Intelligence & Data Visualization: Developed an interactive dashboard in Power BI (IBCS standards) to present key insights from the processed data and performing data modeling and establishing relationships for insightful visualizations.

• Security & Automation: Implemented Azure Key Vault for secure storage and retrieval of credentials within ADF and Databricks. Designed daily pipeline automation using ADF triggers. Conceptualized security best practices using Azure Active Directory (Entra ID).

This project allowed me to gain experience in crucial data engineering concepts including ETL/ELT processes, data modeling, cloud architecture, and data governance and building an entire pipeline.

I've documented the project extensively, including notes, graphics, screen captures and explanations of some key topics, in my GitHub repo. Please find it here: https://github.com/dineshpalli/rg-data-engineering-project

I would like to extend my sincere thanks to Luke J. Byrne for his excellent walkthrough video (https://youtu.be/ygJ11fzq_ik?si=1NV-tvrmJhvOTnXp) and his original repository (https://github.com/lukejbyrne/rg-data-engineering-project), which were invaluable resources throughout this project.

I encourage you to explore the repository. Feel free to clone, fork, or contribute to the project and to leave feedback. I am always open to connecting with fellow data enthusiasts, professionals.

My LinkedIn
Luke's Channel

Top comments (0)