DEV Community

Cover image for Data Engineering Pipeline: Understanding ETL vs ELT
Wangila russell
Wangila russell

Posted on

Data Engineering Pipeline: Understanding ETL vs ELT

Introduction
This week, I started learning Data Engineering concepts, and one of the most important topics we covered was ETL and ELT pipelines.

To make it practical, I built my first simple data pipeline using Python, which helped me understand how raw data is transformed into usable insights.

What is a Data Pipeline?

A data pipeline is a process that moves data from one system to another while transforming it into a usable format.

Think of it like:

Raw data → Cleaning → Processing → Final usable data
Enter fullscreen mode Exit fullscreen mode

What is ETL?
ETL = Extract, Transform, Load

  1. Extract – Get data from a source (API, database, file)
  2. Transform – Clean and structure the data
  3. Load – Store it in a database or warehouse Example
  • Extract weather data from an API
  • Clean missing values
  • Store it in a database

What is ELT?

ELT = Extract, Load, Transform

  1. Extract – Get raw data
  2. Load – Store raw data first
  3. Transform – Clean and process inside the database Key idea: In ELT, transformation happens after loading, usually in powerful systems like data warehouses.
ETL ELT
Transform before loading Transform after loading
Used in traditional systems Used in modern cloud systems
Less raw data stored Raw data stored first

My First Data Pipeline

This week, I built a simple Python pipeline that:

  1. Extracted data from a collectapi
  2. Transformed it into a structured format
  3. Converted it into a Pandas DataFrame
  4. Transformed the data and stored in my db

Here is a code snippet of transforming the data

def transform_cities(data):
    cities_df = pd.DataFrame(data)
    cities_df = cities_df.drop(columns="lowername")
    cities_df = cities_df.rename(columns={"name": "cities"})
    return cities_df
Enter fullscreen mode Exit fullscreen mode

Final Thoughts

Building my first pipeline helped me connect theory with practice. It showed me that data engineering is not just about tools it’s about designing flow and structure for data.

This is just the beginning, and I’m excited to build more complex pipelines in the future.

Top comments (0)