Understanding ETL and ELT (Bite-size Article)

#database #datascience #etl #elt

Introduction

I previously wrote an introductory article about ETL. I’m still a complete beginner in this field, but I’ve been gradually learning while working hands-on as part of an ongoing project.

Recently, I came across the concept of ELT instead of ETL. The words are the same—Extract / Transform / Load—but the difference lies in when the “Transform (T)” step happens. This post is both a personal memo and a simple summary of the differences between ETL and ELT.

The Specific Differences Between ETL and ELT

Here’s the simplest way to compare them:

ETL: Extract → Transform → Load
Get the data → Clean/transform it → Store it (in a DWH)
ELT: Extract → Load → Transform
Get the data → Store it first (in a DWH) → Transform it later

In this article, I use “DWH” for simplicity, but in practice this could also be a data lake or other analytical platform.

In other words:

ETL: Data is transformed in a separate processing environment before being loaded into the DWH.
ELT: Raw data is loaded into the DWH first, then transformed inside the DWH using SQL or similar tools.

Here’s a hamburger analogy:

ETL: At a central kitchen (ETL platform), the patty is grilled, the burger is assembled, and it’s fully wrapped—then delivered to the restaurant (saved to the DWH).
→ Ready to eat immediately, but if the customer says “no pickles,” it’s hard to redo.
ELT: Ingredients (patty, bun, vegetables, sauce) are stored in the restaurant’s fridge (DWH) first, and the burger is assembled to order (transformed inside the DWH using SQL/dbt).
→ Easy to customize later, but requires more storage space and effort (cost).

Even in ELT, there’s often some “light inspection” before storage, such as removing PII or performing format checks.

Background: Why Did People Start Moving the “T”?

From what I’ve gathered, there are several reasons why ELT has been gaining popularity:

1. Improved performance of cloud DWHs

Modern cloud DWHs like BigQuery, Snowflake, and Redshift can scale computing resources on demand, making it possible to process large transformations in a short time.
In the past, “DWHs are too weak for heavy processing” was common wisdom—this is no longer always the case.

2. Lower storage costs

Cloud storage has become much cheaper, making it feasible to store raw data in full for long periods.
Keeping unprocessed data allows you to reprocess it into different forms later.

3. Easier to adapt to changing analysis needs

You no longer need to lock in a detailed schema at the start. By “just loading everything first,” it’s easier to add new metrics or analyses later.
You can simply change the SQL or dbt transformation logic and re-run, shortening development cycles.

4. Lower barriers to implementation and operation

Without a dedicated ETL server or tool to maintain, data transfer plus in-DWH transformation can be enough—making it easier for small teams or new projects to get started.

In short, my understanding is:

“It’s now faster and cheaper to do all the processing inside the DWH.”

This environment has made it more practical to move the “T” to the end (ELT).
That said, ETL hasn’t disappeared—depending on data sensitivity, governance needs, or other factors, ETL is still chosen in many cases.

Conclusion

To summarize:

ETL = Clean before storing
ELT = Store first, clean later

Honestly, I’m not sure if what I’ve learned this time will be directly useful in my work right away.
However, knowing that there’s this kind of difference in how data can be handled might make it a little easier to follow project discussions or read related documents.
If I ever face this decision in practice, I hope this memo will come in handy—and if this article offers any hint or reference for your own work or daily life, I’ll be glad.