DEV Community

Cover image for Incremental vs Delta Extraction and Incremental Load vs Upsert in ETL pipelines
Braeson Nyahera
Braeson Nyahera

Posted on

Incremental vs Delta Extraction and Incremental Load vs Upsert in ETL pipelines

Introduction

Modern systems generate vast amount of data daily if not hourly or every second, whether it is through creating entirely new data or just updating existing data. Engineers work around this by having frameworks on how each of this instances are handled.
Terms such as Incremental extraction, delta extraction,incremental load and upsert are frequently used in ELT/ETL processes.
Although these concepts are relatable, misunderstanding them impacts the optimality of the pipeline, hence it is necessary to know each as a unit of it's own.

Understanding where they fit

An ETL pipeline consists of three main steps:

  1. Extract
  2. Transform
  3. Load

Structure displaying the extraction and loading concepts
Incremental extraction and delta extraction are found the extract step whereas Incremental load and Upsert are found in the load step.

Incremental Extraction vs Delta Extraction

Incremental Extraction

Incremental extraction refers to the process of extracting data that has been changed since a certain point in time. This ensures that not all the data is extracted in every instance.
This strategy heavily depends on timestamps to know data which is new to be extracted.
For example: In a table containing 1000000 records, if only 500 records are added then those are the only ones which would be added.

Sample code:

SELECT * FROM orders 
WHERE last_modified > '2026-06-14 00:00:00';
Enter fullscreen mode Exit fullscreen mode

Key Limitation

Incremental extraction cannot detect hard deletes. If a record is removed from the source, it leaves no updated timestamp behind, so the deletion is invisible to this strategy. Deleted rows will persist in the destination as stale data.

Delta Extraction

Delta extraction works by capturing exact changes which have occurred since the last extraction. This include inserts, updates and deletions.
The term "delta" represents difference between two states of data. It depends on CDC(Change Data Capture) which includes; database transaction logs, triggers, or difference tables, rather than relying on timestamp to capture every insert, update, and delete.

Incremental Load vs Upsert

Incremental load

Incremental load is the process of loading only new or modified records. This will prevent the whole dataset being reloaded every other time making the process less efficient.
If out of 10000 records only 10 were changed or added then only the ones which were changed are the ones to be loaded.

Upsert

Upsert works as a combination of two processes, Insert and Update. This comes in when only appending data is not enough. This process firsts check if the record exists, if it does then it is updated but if it does not exist it is loaded.
For upsert to work you need a unique identifier and a logic to check if the record exists.

Sample code:

INSERT INTO customers (customer_id, name, city)
VALUES (101, 'Jane', 'Nairobi')
ON CONFLICT (customer_id)
DO UPDATE
SET
 name = EXCLUDED.name,
 city = EXCLUDED.city;
Enter fullscreen mode Exit fullscreen mode

Wrapping Up

These four concepts, incremental extraction, delta extraction, incremental load, and upsert, each solve a specific part of the same problem: keeping data systems in sync efficiently. Understanding them individually is what allows an engineer to design pipelines that are correct, performant, and maintainable.

The choice between them is rarely about which is "better" rather it is about what the data requires. Append-only data with no deletes calls for a simple incremental extract and load. Frequently updated records with hard deletes call for delta and precise upsert logic. Matching the strategy to the data is the mark of a well-designed pipeline.

Top comments (1)

Collapse
 
joseph_mwangi_3ae1f57a132 profile image
joseph mwangi

This was very informative. Nice stuff 👏🏿