Gowtham Potureddi

Posted on May 23

INSERT, UPDATE, DELETE in SQL: Safe CRUD Patterns for Data Engineers

#python #sql #interview #dataengineering

sql insert, sql update, and sql delete are the three DML verbs that move every row in a warehouse: sql insert adds new rows (one row, many rows, or INSERT INTO … SELECT from another query), sql update changes existing rows in place (always paired with a WHERE clause unless you really mean "every row"), and sql delete removes rows (with WHERE, or via TRUNCATE for a fast whole-table reset). Together they form sql crud — the dml in sql vocabulary every sql interview questions panel tests, and every data engineering interview questions loop circles back to when the conversation turns from SELECT to data movement.

This guide walks through every clause in the sql data manipulation family that reviewers love to test in data engineering interview questions: the safe-CRUD mental model, insert into sql patterns (single-row, multi-row, INSERT … SELECT), the update statement sql family with UPDATE … FROM JOIN for cross-table updates, delete from sql vs TRUNCATE TABLE vs DROP TABLE (the three operators that look similar but behave nothing alike), upsert sql and merge sql with ON CONFLICT / MERGE INTO, transaction patterns (BEGIN / COMMIT / ROLLBACK), and the seven gotchas (missing WHERE clauses, fan-out from joins, lock escalation) that fail most candidates. Every section ends as sql interview questions with answers: a runnable PostgreSQL query, a traced execution, an output table, and a concept-by-concept why this works breakdown — the exact shape sql for data engineers rounds reward when crud in sql comes up.

When you want hands-on reps immediately after reading, browse data-manipulation SQL practice →, drill the SQL practice lane →, sharpen ETL SQL drills →, rehearse database problems →, or widen coverage on the full SQL practice library →.

On this page

Why INSERT, UPDATE, DELETE matter in data engineering interviews
INSERT — single-row, multi-row, and INSERT INTO SELECT
UPDATE — WHERE-clause discipline and cross-table UPDATE FROM JOIN
DELETE vs TRUNCATE vs DROP — three operators that look alike
UPSERT, MERGE, and ON CONFLICT — the idempotent write pattern
Transactions, rollback, and write safety
DML interview gotchas — missing WHERE, fan-out, lock escalation
Choosing the right DML operator (cheat sheet)
Frequently asked questions
Practice on PipeCode

1. Why INSERT, UPDATE, DELETE matter in data engineering interviews

Three verbs, one safety contract — that's the whole CRUD interview surface

The one-sentence invariant interviewers chase: sql insert adds rows, sql update changes rows in place, and sql delete removes rows; every one of them needs explicit row-targeting (VALUES, WHERE, or ON CONFLICT) or the statement silently touches the entire table — and that's the bug interviewers love to test. Once you internalise the three verbs plus the safety contract, every prompt in the crud sql / crud in sql cluster becomes "pick the right verb and prove your WHERE clause."

The three DML verbs at a glance.

INSERT INTO table (cols) VALUES (…) — adds new rows; multi-row form VALUES (…), (…), (…); INSERT INTO … SELECT ingests from another query (sql insert into select is the most-tested ETL shape).
UPDATE table SET col = expr WHERE … — changes existing rows that match the WHERE clause; update statement sql without a WHERE updates every row.
DELETE FROM table WHERE … — removes rows that match the WHERE clause; delete from sql without a WHERE empties the table (and runs row-by-row, unlike TRUNCATE).

The "safety contract" every CRUD statement must honour.

Always have a row-targeting clause — WHERE for UPDATE / DELETE; VALUES or SELECT for INSERT.
Wrap risky CRUD in a transaction — BEGIN; UPDATE …; SELECT *; ROLLBACK lets you preview the change before committing.
Test with SELECT first — SELECT * FROM t WHERE <predicate> shows exactly which rows would be touched.
Use RETURNING (Postgres / SQL Server OUTPUT) to get back the affected rows in the same statement.

Why interviewers love DML.

Every data engineering pipeline writes data — INSERT for ingest, UPDATE for slowly-changing dimensions, DELETE for soft-delete cascades, MERGE for upserts.
The where clause delete discipline separates juniors from seniors — every senior has a "I once forgot the WHERE" story.
Dialect divergence is large — INSERT … ON CONFLICT (Postgres) vs INSERT … ON DUPLICATE KEY UPDATE (MySQL) vs MERGE INTO (SQL Server / Oracle / Snowflake) all do the same job with different syntax.
Transaction semantics expose understanding of ACID — BEGIN / COMMIT / ROLLBACK are basic-but-tested.

What interviewers listen for.

Do you state the safety contract aloud before writing the UPDATE / DELETE? — "this needs a WHERE order_id = 42 to limit to that single row."
Do you wrap risky changes in a transaction? — BEGIN; … ROLLBACK for preview, COMMIT for commit.
Do you reach for INSERT … ON CONFLICT (or the dialect equivalent) for idempotent ingest? — senior signal.
Do you mention TRUNCATE vs DELETE performance for full-table resets? — bonus points.

Worked example — INSERT + UPDATE + DELETE in one transactional batch

Detailed explanation. Real pipelines bundle multiple DML statements into a single transaction so they all commit (or all roll back) atomically. The canonical shape: ingest new rows, update changed rows, delete obsolete rows, all between BEGIN and COMMIT.

Question. From a staging table staging_orders(order_id, status, amount), apply the changes to a live_orders table: insert new orders, update existing orders whose status changed, and delete orders flagged as 'cancel'. Use a single transaction.

Input. staging_orders slice plus live_orders slice.

staging.order_id	staging.status	staging.amount
1	paid	100
2	cancel	80
3	paid	150

live.order_id	live.status	live.amount
1	pending	100
2	pending	80

Code (PostgreSQL).

BEGIN;

INSERT INTO live_orders (order_id, status, amount)
SELECT s.order_id, s.status, s.amount
FROM staging_orders s
LEFT JOIN live_orders l ON l.order_id = s.order_id
WHERE l.order_id IS NULL
  AND s.status <> 'cancel';

UPDATE live_orders l
SET status = s.status,
    amount = s.amount
FROM staging_orders s
WHERE l.order_id = s.order_id
  AND s.status <> 'cancel'
  AND (l.status <> s.status OR l.amount <> s.amount);

DELETE FROM live_orders l
USING staging_orders s
WHERE l.order_id = s.order_id
  AND s.status = 'cancel';

COMMIT;

Step-by-step explanation.

BEGIN opens a transaction — every following statement is part of the same atomic batch.
INSERT INTO … SELECT adds new orders (left-anti join finds rows in staging that don't exist in live).
UPDATE … FROM staging_orders mutates existing rows whose status or amount changed.
DELETE … USING staging_orders removes live rows that staging marks as cancelled.
COMMIT makes the whole batch durable; ROLLBACK would undo every statement.

Output (after commit).

order_id	status	amount
1	paid	100
3	paid	150

Rule of thumb: bundle related DML in a transaction; preview with SELECT before each statement; never skip the WHERE.

SQL
Topic — data-manipulation
SQL CRUD drills

Practice →

SQL
Topic — etl
ETL SQL practice

Practice →

2. INSERT — single-row, multi-row, and INSERT INTO SELECT

`sql insert` and `insert into sql` — three shapes that cover 95% of inserts

The sql insert family has three shapes that every reviewer expects you to know: single-row inserts for tests and seed data, multi-row inserts for batches, and INSERT INTO … SELECT for ingest from another table or query. Every dialect supports all three; the syntax is almost identical across PostgreSQL, MySQL, SQL Server, Oracle, and Snowflake.

Shape 1 — single-row INSERT INTO … VALUES (…).

Signature — INSERT INTO <table> (<col1>, <col2>, …) VALUES (<val1>, <val2>, …).
insert into sql with explicit column list is the canonical interview shape; skipping the column list is a code smell.
Default values — columns with DEFAULT constraints are filled automatically when omitted from the column list.
SERIAL / IDENTITY / AUTO_INCREMENT — sequence-backed columns auto-generate values; omit them from the column list.

Shape 2 — multi-row INSERT INTO … VALUES (…), (…), (…).

One statement, many rows — INSERT INTO t (a, b) VALUES (1, 'x'), (2, 'y'), (3, 'z').
Atomic — all rows insert or none do (one transactional unit).
Faster than N single-row inserts by 10-100× — the planner builds one execution plan, not N.
MySQL caveat — the max_allowed_packet server setting caps the statement length; very large multi-row inserts need to be batched.

Shape 3 — INSERT INTO … SELECT (the ETL workhorse).

Signature — INSERT INTO <target> (<cols>) SELECT <expr_list> FROM <source> WHERE <predicate>.
sql insert into select is what every staging-to-live ingest in a warehouse looks like.
Column count and types must match between the SELECT projection and the target table's column list.
Can pull from any query — JOINs, CTEs, set operations, subqueries; the source need not be a base table.

bulk insert sql — beyond multi-row VALUES.

PostgreSQL COPY FROM — the fastest bulk-load operator; reads from a file or STDIN, bypasses the WAL formatter overhead.
MySQL LOAD DATA INFILE — equivalent for CSV ingestion.
SQL Server BULK INSERT / bcp — the SQL Server bulk loader.
Snowflake COPY INTO — pulls from S3 / GCS / Azure Blob.
Use them when ingesting > 100k rows from a file; INSERT … VALUES is too slow at scale.

INSERT … RETURNING — get back what you wrote.

PostgreSQL / Oracle — INSERT INTO t (a) VALUES (1) RETURNING id returns the generated id without a follow-up SELECT.
SQL Server — INSERT … OUTPUT INSERTED.* does the same.
Use case — capturing auto-generated primary keys; chaining inserts that need the new id; loading without a second round-trip.

Common INSERT pitfalls.

No column list — INSERT INTO t VALUES (…) assumes positional matching; one schema change breaks every caller.
NULL into NOT NULL — raises an error; INSERT … VALUES (NULL, …) into a NOT NULL column fails loudly.
Type mismatches — explicit cast or TRY_CAST upstream avoids the silent-coercion bugs (see Blog78).
Forgotten RETURNING — clients that need the new id make two round-trips instead of one.

SQL interview question — INSERT INTO … SELECT a deduplicated staging slice

Assume staging_users(email, name, created_at) may contain duplicate emails. Load distinct users into users(email, name, created_at) using INSERT INTO … SELECT.

Solution Using INSERT INTO … SELECT + DISTINCT ON

Detailed explanation. The standard ETL shape: read from staging, dedupe to one row per natural key, project the right columns, insert into the live table. DISTINCT ON (Postgres) picks one row per email; the equivalent in portable SQL is ROW_NUMBER() = 1 inside a CTE.

Code (PostgreSQL).

INSERT INTO users (email, name, created_at)
SELECT DISTINCT ON (email)
       email, name, created_at
FROM staging_users
ORDER BY email, created_at;

Equivalent in portable SQL.

INSERT INTO users (email, name, created_at)
SELECT email, name, created_at
FROM (
    SELECT email, name, created_at,
           ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at) AS rn
    FROM staging_users
) ranked
WHERE rn = 1;

Step-by-step trace.

step	relation	outcome
1	Scan `staging_users`	every staging row read
2	`DISTINCT ON (email) ... ORDER BY email, created_at`	first row per email
3	Project `(email, name, created_at)`	three columns
4	`INSERT INTO users`	dedup'd rows land in live

Output: users now has one row per unique email from staging.

Why this works — concept by concept:

INSERT INTO … SELECT — the canonical ETL shape; reads from one relation and writes to another in a single statement.
DISTINCT ON (email) — Postgres-specific; pairs with ORDER BY email, created_at to pick the earliest row per email.
ROW_NUMBER() = 1 — portable alternative; works on every dialect via window functions.
No PRIMARY KEY violation — the dedupe guarantees one row per email; if email is the PK, raw insert without dedupe would error.
Cost — Θ(n) scan + Θ(n log n) sort for the dedupe; the insert itself is Θ(distinct(n)).

SQL
Topic — data-manipulation
INSERT SQL drills

Practice →

SQL
Topic — etl
INSERT + ETL patterns

Practice →

3. UPDATE — WHERE-clause discipline and cross-table UPDATE FROM JOIN

`sql update` and `update statement sql` — never run without WHERE

sql update changes existing rows in place. The single most-asked sql interview questions prompt in this space is "what happens if you forget the WHERE clause?" — the answer is "every row in the table gets updated, silently." This is the update statement sql safety lesson every senior data engineer has learned the hard way.

The UPDATE signature.

Single-table — UPDATE <table> SET <col1> = <expr1>, <col2> = <expr2> WHERE <predicate>.
The WHERE clause is not optional in practice — without it, every row is updated.
SET accepts any expression — SET amount = amount * 1.1 (10% increase); SET status = 'paid' WHERE ….
RETURNING (Postgres / Oracle) — UPDATE … RETURNING id, old_status, new_status returns affected rows.

UPDATE … FROM JOIN — cross-table update (the senior pattern).

PostgreSQL — UPDATE target t SET col = s.val FROM source s WHERE t.id = s.id.
SQL Server — UPDATE target SET col = s.val FROM target t JOIN source s ON t.id = s.id.
MySQL — UPDATE target t JOIN source s ON t.id = s.id SET t.col = s.val.
Oracle / Snowflake — use MERGE INTO … USING … instead.

The WHERE clause patterns.

Single-row update — WHERE id = 42 (the safest predicate; touches at most one row).
Bounded set — WHERE id IN (1, 2, 3) or WHERE status = 'pending'.
Cross-table — WHERE t.id = s.id in the JOIN form.
Conditional — SET status = CASE WHEN amount > 100 THEN 'large' ELSE 'small' END WHERE ….

sql update multiple columns — set many fields in one statement.

One SET clause, multiple assignments — SET col1 = val1, col2 = val2, col3 = val3 WHERE id = 42.
Use a sub-select for many fields — SET (col1, col2) = (SELECT a, b FROM other WHERE …) in Postgres / Oracle.
MySQL row constructor — SET (col1, col2) = (SELECT a, b FROM other) since 8.0+.

UPDATE with a subquery (the lookup pattern).

UPDATE orders
SET region = (SELECT r.region
              FROM customers c
              JOIN regions r ON c.region_id = r.id
              WHERE c.customer_id = orders.customer_id)
WHERE region IS NULL;

Per-row correlated subquery — runs once per matched row.
Pre-aggregate with a CTE for performance when the subquery is expensive.
Watch for NULL — if the subquery returns no row, the column is set to NULL; guard with WHERE EXISTS.

Common UPDATE pitfalls.

Missing WHERE — the canonical interview trap; updates every row in the table.
Implicit type coercion — SET int_col = '42' triggers per-row casting; use CAST (see Blog78).
Triggering chain reactions — BEFORE UPDATE triggers fire on every row; can compound cost.
Updating a primary key — usually disallowed by foreign-key constraints; if needed, delete and reinsert.

SQL interview question — UPDATE FROM JOIN to backfill region from customer

Assume orders(order_id, customer_id, region) where region is sometimes NULL, and customers(customer_id, region) always has the correct value. Backfill orders.region from customers.region for the rows where orders.region IS NULL.

Solution Using UPDATE … FROM JOIN

Detailed explanation. Cross-table UPDATE is the textbook backfill shape. Postgres uses UPDATE … FROM; SQL Server uses UPDATE … FROM target JOIN source; MySQL uses UPDATE target JOIN source. The semantics are identical; the syntax differs.

Code (PostgreSQL).

UPDATE orders o
SET region = c.region
FROM customers c
WHERE o.customer_id = c.customer_id
  AND o.region IS NULL;

Equivalent in MySQL.

UPDATE orders o
JOIN customers c ON o.customer_id = c.customer_id
SET o.region = c.region
WHERE o.region IS NULL;

Equivalent in SQL Server.

UPDATE o
SET region = c.region
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.region IS NULL;

Step-by-step trace.

step	relation	outcome
1	Scan `orders`	all order rows
2	Hash JOIN with `customers` on `customer_id`	matching customer per order
3	`WHERE o.region IS NULL`	only rows needing a backfill
4	`SET region = c.region`	mutate the target column
5	Commit (implicit if no transaction wraps it)	changes durable

Output: every NULL region in orders is replaced with the customer's actual region.

Why this works — concept by concept:

UPDATE … FROM customers — Postgres's syntax for "update target using a joined source".
WHERE o.region IS NULL — the safety predicate; only touches rows that need the backfill.
Hash join under the hood — the planner picks an efficient join strategy; Θ(n + m) for the lookup.
No triggers fire on unchanged rows — the predicate keeps the change set minimal.
Wrap in a transaction for preview — BEGIN; UPDATE …; SELECT * FROM orders WHERE region IS NULL; ROLLBACK (or COMMIT).
Cost — single scan of orders + hash probe into customers; the update is Θ(matched_rows).

SQL
Topic — data-manipulation
UPDATE SQL drills

Practice →

SQL
Topic — joins
JOIN + UPDATE patterns

Practice →

4. DELETE vs TRUNCATE vs DROP — three operators that look alike

`sql delete`, `TRUNCATE`, and `DROP` — three operators, three very different behaviours

sql delete removes rows that match a WHERE predicate. TRUNCATE TABLE removes every row in a fast, minimally-logged operation. DROP TABLE removes the table itself (rows AND schema). They look similar at first glance; their behaviour is wildly different — and the delete vs truncate vs drop comparison is one of the most-asked sql interview questions in the DML cluster.

DELETE FROM … WHERE … — row-by-row removal.

Signature — DELETE FROM <table> WHERE <predicate>.
Per-row WAL logging — every deleted row is logged for rollback; expensive on large tables.
Triggers fire — BEFORE DELETE and AFTER DELETE triggers run per row.
Foreign keys checked — ON DELETE CASCADE / RESTRICT rules apply.
Rollback supported — DELETE inside a transaction can be rolled back.
Identity sequence not reset — the next inserted row gets the next sequence value.
delete from sql without WHERE — empties the table row-by-row (slow); never the right tool for full-table reset.

TRUNCATE TABLE — fast whole-table empty.

Signature — TRUNCATE TABLE <table> (no WHERE clause possible).
Minimal logging — most engines log the page-level deallocation, not per-row.
Faster than DELETE by 10-1000× on large tables.
Triggers do NOT fire — most engines skip row-level triggers.
Identity sequence reset (dialect-dependent) — Postgres TRUNCATE … RESTART IDENTITY; SQL Server resets by default; MySQL resets by default.
truncate vs delete — the textbook interview comparison; pick TRUNCATE for full-table reset, DELETE for predicate-based removal.
Cannot be rolled back in MySQL InnoDB — implicit commit; Postgres and SQL Server support transactional truncate.

DROP TABLE — remove the table itself.

Signature — DROP TABLE <table>.
Removes rows, indexes, constraints, triggers, AND the table definition.
Irreversible — restoring requires a backup or CREATE TABLE from a saved script.
Foreign keys — DROP TABLE … CASCADE removes dependent objects too.
Use case — temporary tables, dev/test cleanup, deprecated schemas.

delete vs truncate vs drop — the decision table.

Operator	Removes rows?	Removes schema?	Per-row triggers?	WAL cost	Rollback (in tx)?	Sequence reset?
`DELETE`	Yes (per WHERE)	No	Yes	per row (high)	Yes	No
`TRUNCATE`	Yes (all)	No	No (most engines)	minimal	Yes (PG/SQL Server)	Yes (most engines)
`DROP`	Yes (all)	Yes	n/a	minimal	Yes (PG, SQL Server)	n/a

delete from sql with where — the safe shape.

Always start with SELECT * FROM t WHERE <predicate> to confirm the row set.
Then change SELECT * to DELETE keeping the same WHERE.
Wrap in BEGIN; DELETE …; SELECT COUNT(*) FROM t WHERE …; ROLLBACK to preview.
RETURNING * (Postgres) shows exactly which rows were deleted.

DELETE with JOIN (the related-tables shape).

-- PostgreSQL: DELETE … USING
DELETE FROM orders o
USING customers c
WHERE o.customer_id = c.customer_id
  AND c.status = 'closed';

-- MySQL: DELETE alias FROM target JOIN source
DELETE o
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.status = 'closed';

-- SQL Server: DELETE … FROM
DELETE o
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.status = 'closed';

SQL interview question — purge orders for closed customers

Assume orders(order_id, customer_id, status) and customers(customer_id, status). Delete every order whose customer's status is 'closed'. Use a transaction.

Solution Using DELETE … USING + transaction

Detailed explanation. Wrap the destructive operation in a transaction so a preview is safe. DELETE … USING (Postgres) is the multi-table delete shape; SQL Server uses DELETE … FROM; MySQL uses DELETE alias FROM … JOIN ….

Code (PostgreSQL).

BEGIN;

-- preview first
SELECT o.order_id, o.customer_id, c.status
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.status = 'closed';

DELETE FROM orders o
USING customers c
WHERE o.customer_id = c.customer_id
  AND c.status = 'closed'
RETURNING o.order_id, o.customer_id;

COMMIT;

Step-by-step trace.

step	relation	outcome
1	`BEGIN`	open transaction
2	Preview `SELECT`	confirm the rows that would be deleted
3	`DELETE … USING` joins to `customers`	identify orders whose customer is closed
4	`RETURNING o.order_id, o.customer_id`	capture deleted-row identifiers
5	`COMMIT`	changes durable

Output: the RETURNING clause emits one row per deleted order.

Why this works — concept by concept:

BEGIN / COMMIT wraps the destructive op — a ROLLBACK mid-transaction undoes the delete without permanent damage.
Preview SELECT first — confirms the predicate matches the expected rows.
DELETE … USING — Postgres's multi-table delete syntax; same idea as MySQL's DELETE a FROM a JOIN b.
RETURNING captures deleted IDs — useful for downstream logging or compensating actions.
No CASCADE concerns — the delete is on orders only; customers rows are untouched.
Cost — single scan + hash probe; deletes Θ(matched_rows) rows.

SQL
Topic — data-manipulation
DELETE SQL drills

Practice →

SQL
Topic — database
Database SQL library

Practice →

5. UPSERT, MERGE, and ON CONFLICT — the idempotent write pattern

`upsert sql` and `merge sql` — insert if new, update if exists

upsert sql is the idempotent write pattern: insert the row if it doesn't exist, update it if it does. Every dialect spells this differently — INSERT … ON CONFLICT DO UPDATE (Postgres), INSERT … ON DUPLICATE KEY UPDATE (MySQL), MERGE INTO (SQL Server / Oracle / Snowflake / BigQuery) — but the semantics are identical. The merge into sql server / insert on conflict postgres family is the second-most-asked dml in sql interview topic after delete vs truncate.

PostgreSQL INSERT … ON CONFLICT DO UPDATE — the most readable.

INSERT INTO users (email, name, last_login)
VALUES ('alice@example.com', 'Alice', NOW())
ON CONFLICT (email) DO UPDATE
SET name       = EXCLUDED.name,
    last_login = EXCLUDED.last_login;

ON CONFLICT (col) — names the unique key that triggers conflict resolution.
DO UPDATE SET … — the update clause; EXCLUDED.col references the values from the attempted insert.
DO NOTHING — skip the row instead of updating; useful for idempotent inserts.
insert on conflict postgres is the most-searched form of this pattern.

MySQL INSERT … ON DUPLICATE KEY UPDATE.

INSERT INTO users (email, name, last_login)
VALUES ('alice@example.com', 'Alice', NOW())
ON DUPLICATE KEY UPDATE
    name       = VALUES(name),
    last_login = VALUES(last_login);

ON DUPLICATE KEY — fires when any unique key would be violated.
VALUES(col) — references the value from the attempted insert (deprecated in MySQL 8.0.20+; replaced by AS new SET col = new.col).
Modern MySQL — INSERT INTO users (…) VALUES (…) AS new ON DUPLICATE KEY UPDATE col = new.col.

SQL Server / Oracle / Snowflake MERGE INTO.

MERGE INTO users AS target
USING (VALUES ('alice@example.com', 'Alice', CURRENT_TIMESTAMP)) AS src(email, name, last_login)
   ON target.email = src.email
WHEN MATCHED THEN
    UPDATE SET name = src.name, last_login = src.last_login
WHEN NOT MATCHED THEN
    INSERT (email, name, last_login) VALUES (src.email, src.name, src.last_login);

MERGE INTO target USING source ON predicate — ANSI-standard merge.
WHEN MATCHED THEN UPDATE — runs when the predicate matches.
WHEN NOT MATCHED THEN INSERT — runs when the predicate doesn't match.
WHEN NOT MATCHED BY SOURCE THEN DELETE (SQL Server) — full three-way merge.
merge into sql server has been GA since SQL Server 2008; PostgreSQL added MERGE in 15.

Snowflake / BigQuery MERGE.

Snowflake — uses the same MERGE INTO syntax as SQL Server; very common in cloud warehouse interviews.
BigQuery — MERGE with the same shape; used heavily in slowly-changing-dimension (SCD) Type 2 pipelines.

The use cases that demand UPSERT.

Idempotent ingest — re-running a pipeline must not double-insert.
Slowly-changing dimensions — customer dimension updates without losing history (combine MERGE with effective-date columns).
Event aggregation — running totals that update if the (entity, day) key exists, insert otherwise.
CDC / change-data-capture — apply inserts, updates, and deletes from a source change feed.

When to reach for which.

PostgreSQL primary stack — INSERT … ON CONFLICT DO UPDATE; most readable, native, and well-supported.
MySQL primary stack — INSERT … ON DUPLICATE KEY UPDATE; equally native.
SQL Server / Oracle / Snowflake / BigQuery — MERGE INTO; the only option, also the most powerful (three-way matched / not-matched / not-matched-by-source).
PostgreSQL 15+ — also has MERGE; pick ON CONFLICT for upserts and MERGE for three-way logic.

SQL
Topic — data-manipulation
UPSERT / MERGE drills

Practice →

SQL
Topic — etl
ETL + UPSERT patterns

Practice →

6. Transactions, rollback, and write safety

`BEGIN`, `COMMIT`, `ROLLBACK` — the safety net every CRUD statement deserves

Every destructive sql update or sql delete belongs inside a transaction. The pattern is the same on every dialect: open the transaction, run the DML, inspect the result, then commit or roll back. The cost is one extra line of SQL; the upside is that you can preview and undo any mistake before it becomes permanent.

The canonical preview-and-commit pattern.

BEGIN;

-- preview the change
SELECT * FROM orders WHERE region = 'unknown';

-- do the change
UPDATE orders SET region = 'EU' WHERE region = 'unknown';

-- inspect again
SELECT * FROM orders WHERE region = 'unknown';   -- should be 0 rows
SELECT * FROM orders WHERE region = 'EU';         -- should include the updated rows

-- if happy:
COMMIT;
-- if not:
-- ROLLBACK;

Transaction primitives by dialect.

PostgreSQL — BEGIN; … COMMIT; or BEGIN; … ROLLBACK;; also SAVEPOINT and ROLLBACK TO SAVEPOINT.
MySQL — START TRANSACTION; … COMMIT;; SET autocommit = 0 to make every statement transactional.
SQL Server — BEGIN TRANSACTION; … COMMIT TRANSACTION; or ROLLBACK TRANSACTION.
Oracle — SET TRANSACTION READ WRITE; … COMMIT;; implicit transaction starts on first DML.
Snowflake — BEGIN; … COMMIT;; full ACID semantics.

ACID properties — what transactions guarantee.

Atomicity — all DML in the transaction commits together, or none does.
Consistency — constraints (NOT NULL, foreign keys, unique) are checked at commit time.
Isolation — concurrent transactions don't see each other's uncommitted changes (depends on isolation level).
Durability — once committed, the change survives a crash (WAL / redo log).

Isolation levels — the cheat-bullet.

READ UNCOMMITTED — sees uncommitted writes from others ("dirty reads"); rarely used.
READ COMMITTED — sees only committed writes; the default in Postgres, SQL Server, Oracle.
REPEATABLE READ — repeated reads in the same transaction return the same result; MySQL default.
SERIALIZABLE — strongest; transactions appear to execute one at a time; sometimes via retry.

ROLLBACK — undo without commit.

Inside a transaction — ROLLBACK discards every change since BEGIN.
SAVEPOINT s1 + ROLLBACK TO s1 — partial rollback to a named point inside a transaction.
Implicit rollback — on session disconnect, network error, server crash; uncommitted changes are discarded.
The senior signal — knowing that ROLLBACK is your friend and using it freely.

The cost of long-running transactions.

Lock duration — a long UPDATE holds row-level locks until commit; concurrent writers wait.
WAL / undo log size — large transactions inflate the log; vacuum / log truncation depends on commit.
Snapshot bloat (Postgres) — MVCC snapshots accumulate; long transactions block VACUUM from reclaiming space.
Production discipline — break large batches into smaller transactions; commit every 1k–10k rows.

RETURNING / OUTPUT — pair with transactions.

UPDATE … RETURNING … (Postgres) — captures changed rows inside the transaction.
OUTPUT INSERTED.*, DELETED.* (SQL Server) — captures both old and new values; perfect for audit logs.
Use case — write an audit-trail INSERT in the same transaction that captures the UPDATE/DELETE side-effects.

SQL
Topic — database
Transactions SQL drills

Practice →

SQL
Topic — data-manipulation
Transactional CRUD patterns

Practice →

7. DML interview gotchas — missing WHERE, fan-out, lock escalation

The seven bugs interviewers test most often on CRUD

DML has a small surface area but a long tail of edge cases that fail candidates. These are the seven gotchas reviewers test most often when crud sql comes up in an interview.

Gotcha 1 — UPDATE / DELETE without WHERE.

The bug — UPDATE orders SET status = 'paid' updates every row in orders.
Symptom — production-data corruption; "I just lost an afternoon" engineer slack messages.
Fix — always preview with SELECT * FROM t WHERE <predicate> first; wrap in a transaction.
The discipline — type WHERE before SET when writing UPDATE from scratch.

Gotcha 2 — fan-out from UPDATE … FROM JOIN.

The bug — joining to a source table with multiple matching rows updates the target row N times with the last match (engine-dependent).
Symptom — non-deterministic results; "the update sometimes does the right thing".
Fix — pre-aggregate the source to one row per join key inside a CTE.

Gotcha 3 — DELETE cascade chains too far.

The bug — ON DELETE CASCADE foreign keys propagate deletes through unrelated tables.
Symptom — deleting one customer wipes out their orders, their payments, their reviews — none of which the engineer intended.
Fix — confirm the foreign-key topology; use RESTRICT instead of CASCADE for safety; soft-delete via a deleted_at column instead.

Gotcha 4 — TRUNCATE cannot be rolled back in MySQL.

The behaviour — MySQL InnoDB TRUNCATE does an implicit commit; once issued, no ROLLBACK recovers the data.
Symptom — "I wrapped it in START TRANSACTION and it still committed".
Fix — use DELETE FROM t inside a transaction for previewable removal; or take a backup before TRUNCATE.

Gotcha 5 — INSERT … SELECT without dedup violates UNIQUE.

The bug — staging table has duplicates; insert into a table with a UNIQUE constraint errors on the second occurrence.
Symptom — ERROR: duplicate key value violates unique constraint.
Fix — dedup with DISTINCT ON / ROW_NUMBER() = 1 (Postgres) or INSERT … ON CONFLICT DO NOTHING for an idempotent ingest.

Gotcha 6 — implicit type coercion on the SET side of UPDATE.

The bug — SET int_col = '42' silently parses the string; if the string is 'abc' the update errors mid-batch and leaves the table partially updated (if no transaction wraps it).
Fix — explicit CAST (see Blog78); validate upstream; wrap in a transaction so partial application can roll back.

Gotcha 7 — lock escalation on large UPDATE / DELETE.

The behaviour — many row-level locks get promoted to table-level locks above an engine-specific threshold; concurrent writers block until the operation completes.
Symptom — production app freezes for the duration of a multi-million-row UPDATE.
Fix — batch the change in chunks (WHERE id BETWEEN 1 AND 10000, LIMIT 10000, etc.) and commit between chunks.

SQL
Topic — data-manipulation
DML gotcha drills

Practice →

SQL
Topic — database
Database safety drills

Practice →

Choosing the right DML operator (cheat sheet)

A one-screen cheat sheet for using SQL INSERT, UPDATE, DELETE — pick the verb that matches your intent, then the dialect-specific syntax.

You want to …	Operator	Notes
Add new rows from literals	`INSERT INTO t (cols) VALUES (…)`	Always include the column list
Add many rows in one statement	`INSERT INTO t (cols) VALUES (…), (…), (…)`	Atomic; 10-100× faster than N inserts
Add rows from another table	`INSERT INTO t (cols) SELECT … FROM s`	The ETL workhorse
Bulk-load from a file	`COPY` (PG) / `LOAD DATA INFILE` (MySQL) / `BULK INSERT` (SQL Server)	For > 100k rows
Change existing rows	`UPDATE t SET … WHERE …`	Always `WHERE`
Update from another table	`UPDATE t SET col = s.val FROM s WHERE …` (PG)	`UPDATE t JOIN s` (MySQL); `UPDATE t … FROM` (SQL Server)
Remove specific rows	`DELETE FROM t WHERE …`	Per-row WAL; supports rollback
Empty a whole table fast	`TRUNCATE TABLE t`	Minimally logged; resets identity
Remove the table itself	`DROP TABLE t`	Irreversible without backup
Insert if new, update if exists	`INSERT … ON CONFLICT DO UPDATE` (PG)	`ON DUPLICATE KEY UPDATE` (MySQL); `MERGE INTO` (SQL Server, Oracle, Snowflake)
Idempotent insert (skip duplicates)	`INSERT … ON CONFLICT DO NOTHING`	Safe for re-runnable pipelines
Three-way merge (insert/update/delete)	`MERGE INTO t USING s ON …`	SCD Type 2 pattern
Preview before commit	`BEGIN; SELECT …; DML; SELECT …; ROLLBACK;`	The safety pattern
Capture affected rows	`… RETURNING ` (PG/Oracle) / `OUTPUT INSERTED., DELETED.*` (SQL Server)	One round-trip audit

Frequently asked questions

What's the difference between DELETE, TRUNCATE, and DROP in SQL?

DELETE removes rows that match a WHERE predicate — it's row-by-row, fully logged, triggers fire, foreign keys are checked, and the change can be rolled back inside a transaction. TRUNCATE TABLE removes every row in a single fast operation — minimally logged, no WHERE allowed, no triggers fire on most engines, the identity sequence usually resets, and it's 10-1000× faster than DELETE on large tables (but cannot be rolled back in MySQL InnoDB). DROP TABLE removes the table itself — rows AND schema, indexes, constraints, triggers — and is irreversible without a backup. The canonical interview rule: DELETE for row-level removal with rollback safety; TRUNCATE for fast whole-table reset; DROP for deleting the table definition entirely.

How do I write an UPSERT in SQL?

Every dialect spells UPSERT differently. PostgreSQL — INSERT INTO t (cols) VALUES (…) ON CONFLICT (unique_col) DO UPDATE SET col = EXCLUDED.col. MySQL — INSERT INTO t (cols) VALUES (…) ON DUPLICATE KEY UPDATE col = VALUES(col) (or the modern … AS new ON DUPLICATE KEY UPDATE col = new.col since 8.0.20). SQL Server / Oracle / Snowflake / BigQuery / PostgreSQL 15+ — MERGE INTO t USING source ON match_predicate WHEN MATCHED THEN UPDATE SET … WHEN NOT MATCHED THEN INSERT (…) VALUES (…). The semantics are identical across all of them: insert if no row matches the unique key, update if a row already exists. UPSERT is the foundation of idempotent pipelines — re-running an ingest must not double-insert, and INSERT … ON CONFLICT / MERGE makes that a one-liner.

What happens if I run UPDATE or DELETE without a WHERE clause?

It updates or deletes every row in the table — silently, with no warning. This is the canonical sql interview questions trap and the source of more "I just nuked production" stories than any other DML mistake. The defence: never write UPDATE or DELETE from a blank starting point; type the WHERE clause first, then the SET / FROM. Wrap risky DML in a transaction (BEGIN; UPDATE …; SELECT *; ROLLBACK). Preview with SELECT * FROM t WHERE <predicate> to confirm the row set before changing the verb. Some engines (MySQL 8 with safe-updates mode) refuse to execute UPDATE / DELETE without a WHERE clause that includes a key column — enable this setting in development environments.

How do I update one table based on values in another table?

Use cross-table UPDATE — the dialect-specific syntax differs but the semantics are identical. PostgreSQL — UPDATE target t SET col = s.val FROM source s WHERE t.id = s.id. MySQL — UPDATE target t JOIN source s ON t.id = s.id SET t.col = s.val. SQL Server — UPDATE t SET col = s.val FROM target t JOIN source s ON t.id = s.id. Oracle / Snowflake — use MERGE INTO target USING source ON (…) WHEN MATCHED THEN UPDATE SET …. Always include a WHERE clause (or a MERGE join predicate) that scopes the update; pre-aggregate the source to one row per join key inside a CTE if the source can have duplicates, otherwise the update fans out non-deterministically.

Should I use INSERT INTO SELECT or COPY / LOAD DATA INFILE for bulk loading?

For under ~100k rows, INSERT INTO t (cols) SELECT … FROM staging (or multi-row INSERT … VALUES) is fine — the planner builds one execution plan and writes are batched. For larger loads, switch to the dialect's bulk loader: PostgreSQL COPY FROM (the fastest by far for file ingest, bypasses much of the WAL overhead), MySQL LOAD DATA INFILE, SQL Server BULK INSERT or bcp, Snowflake COPY INTO (from S3 / GCS / Azure), BigQuery's LOAD DATA or table-loader API. The bulk loaders skip per-row WAL formatting and trigger firing where possible — they trade some safety for raw throughput. Inside an ETL pipeline, the standard pattern is: bulk-load into a staging table, then run INSERT INTO live SELECT … FROM staging with explicit deduplication and constraint checks.

Practice on PipeCode

PipeCode ships 450+ data-engineering interview problems — including PostgreSQL-first SQL practice keyed to DML CRUD (INSERT, UPDATE, DELETE), UPSERT patterns with ON CONFLICT / MERGE, transactions, ETL ingest, and the dialect quirks that fail candidates who memorise only one engine. Whether you're drilling sql interview questions with answers for sql for data engineers loops or grinding through data engineering interview questions end-to-end, the practice library mirrors the same safety-first mental model this guide teaches.

Kick off via Explore practice →; drill the data-manipulation SQL lane →; fan out into the ETL SQL lane →; rehearse database problems →; reinforce JOIN + UPDATE patterns →; widen coverage on the full SQL practice library →.

DEV Community

INSERT, UPDATE, DELETE in SQL: Safe CRUD Patterns for Data Engineers

1. Why INSERT, UPDATE, DELETE matter in data engineering interviews

Three verbs, one safety contract — that's the whole CRUD interview surface

Worked example — INSERT + UPDATE + DELETE in one transactional batch

2. INSERT — single-row, multi-row, and INSERT INTO SELECT

`sql insert` and `insert into sql` — three shapes that cover 95% of inserts

SQL interview question — INSERT INTO … SELECT a deduplicated staging slice

Solution Using INSERT INTO … SELECT + DISTINCT ON

3. UPDATE — WHERE-clause discipline and cross-table UPDATE FROM JOIN

`sql update` and `update statement sql` — never run without WHERE

SQL interview question — UPDATE FROM JOIN to backfill region from customer

Solution Using UPDATE … FROM JOIN

4. DELETE vs TRUNCATE vs DROP — three operators that look alike

`sql delete`, `TRUNCATE`, and `DROP` — three operators, three very different behaviours

SQL interview question — purge orders for closed customers

Solution Using DELETE … USING + transaction

5. UPSERT, MERGE, and ON CONFLICT — the idempotent write pattern

`upsert sql` and `merge sql` — insert if new, update if exists

6. Transactions, rollback, and write safety

`BEGIN`, `COMMIT`, `ROLLBACK` — the safety net every CRUD statement deserves

7. DML interview gotchas — missing WHERE, fan-out, lock escalation

The seven bugs interviewers test most often on CRUD

Choosing the right DML operator (cheat sheet)

Frequently asked questions

What's the difference between DELETE, TRUNCATE, and DROP in SQL?

How do I write an UPSERT in SQL?

What happens if I run UPDATE or DELETE without a WHERE clause?

How do I update one table based on values in another table?

Should I use INSERT INTO SELECT or COPY / LOAD DATA INFILE for bulk loading?

Practice on PipeCode

Top comments (0)

1. Why INSERT, UPDATE, DELETE matter in data engineering interviews

Three verbs, one safety contract — that's the whole CRUD interview surface

Worked example — INSERT + UPDATE + DELETE in one transactional batch

2. INSERT — single-row, multi-row, and INSERT INTO SELECT

sql insert and insert into sql — three shapes that cover 95% of inserts

SQL interview question — INSERT INTO … SELECT a deduplicated staging slice

Solution Using INSERT INTO … SELECT + DISTINCT ON

3. UPDATE — WHERE-clause discipline and cross-table UPDATE FROM JOIN

sql update and update statement sql — never run without WHERE

SQL interview question — UPDATE FROM JOIN to backfill region from customer

Solution Using UPDATE … FROM JOIN

4. DELETE vs TRUNCATE vs DROP — three operators that look alike

sql delete, TRUNCATE, and DROP — three operators, three very different behaviours

SQL interview question — purge orders for closed customers

Solution Using DELETE … USING + transaction

5. UPSERT, MERGE, and ON CONFLICT — the idempotent write pattern

upsert sql and merge sql — insert if new, update if exists

6. Transactions, rollback, and write safety

BEGIN, COMMIT, ROLLBACK — the safety net every CRUD statement deserves

7. DML interview gotchas — missing WHERE, fan-out, lock escalation

The seven bugs interviewers test most often on CRUD

Choosing the right DML operator (cheat sheet)

Frequently asked questions

What's the difference between DELETE, TRUNCATE, and DROP in SQL?

How do I write an UPSERT in SQL?

What happens if I run UPDATE or DELETE without a WHERE clause?

How do I update one table based on values in another table?

Should I use INSERT INTO SELECT or COPY / LOAD DATA INFILE for bulk loading?

Practice on PipeCode

`sql insert` and `insert into sql` — three shapes that cover 95% of inserts

`sql update` and `update statement sql` — never run without WHERE

`sql delete`, `TRUNCATE`, and `DROP` — three operators, three very different behaviours

`upsert sql` and `merge sql` — insert if new, update if exists

`BEGIN`, `COMMIT`, `ROLLBACK` — the safety net every CRUD statement deserves