Introduction
Modern systems generate vast amount of data daily if not hourly or every second, whether it is through creating entirely new data or just updating existing data. Engineers work around this by having frameworks on how each of this instances are handled.
Terms such as Incremental extraction, delta extraction,incremental load and upsert are frequently used in ELT/ETL processes.
Although these concepts are relatable, misunderstanding them impacts the optimality of the pipeline, hence it is necessary to know each as a unit of it's own.
Understanding where they fit
An ETL pipeline consists of three main steps:
- Extract
- Transform
- Load

Incremental extraction and delta extraction are found the extract step whereas Incremental load and Upsert are found in the load step.
Incremental Extraction vs Delta Extraction
Incremental Extraction
Incremental extraction refers to the process of extracting data that has been changed since a certain point in time. This ensures that not all the data is extracted in every instance.
This strategy heavily depends on timestamps to know data which is new to be extracted.
For example: In a table containing 1000000 records, if only 500 records are added then those are the only ones which would be added.
Sample code:
SELECT * FROM orders
WHERE last_modified > '2026-06-14 00:00:00';
Key Limitation
Incremental extraction cannot detect hard deletes. If a record is removed from the source, it leaves no updated timestamp behind, so the deletion is invisible to this strategy. Deleted rows will persist in the destination as stale data.
Delta Extraction
Delta extraction works by capturing exact changes which have occurred since the last extraction. This include inserts, updates and deletions.
The term "delta" represents difference between two states of data. It depends on CDC(Change Data Capture) which includes; database transaction logs, triggers, or difference tables, rather than relying on timestamp to capture every insert, update, and delete.
Incremental Load vs Upsert
Incremental load
Incremental load is the process of loading only new or modified records. This will prevent the whole dataset being reloaded every other time making the process less efficient.
If out of 10000 records only 10 were changed or added then only the ones which were changed are the ones to be loaded.
Upsert
Upsert works as a combination of two processes, Insert and Update. This comes in when only appending data is not enough. This process firsts check if the record exists, if it does then it is updated but if it does not exist it is loaded.
For upsert to work you need a unique identifier and a logic to check if the record exists.
Sample code:
INSERT INTO customers (customer_id, name, city)
VALUES (101, 'Jane', 'Nairobi')
ON CONFLICT (customer_id)
DO UPDATE
SET
name = EXCLUDED.name,
city = EXCLUDED.city;
Wrapping Up
These four concepts, incremental extraction, delta extraction, incremental load, and upsert, each solve a specific part of the same problem: keeping data systems in sync efficiently. Understanding them individually is what allows an engineer to design pipelines that are correct, performant, and maintainable.
The choice between them is rarely about which is "better" rather it is about what the data requires. Append-only data with no deletes calls for a simple incremental extract and load. Frequently updated records with hard deletes call for delta and precise upsert logic. Matching the strategy to the data is the mark of a well-designed pipeline.
Top comments (1)
This was very informative. Nice stuff 👏🏿