sql insert, sql update, and sql delete are the three DML verbs that move every row in a warehouse: sql insert adds new rows (one row, many rows, or INSERT INTO … SELECT from another query), sql update changes existing rows in place (always paired with a WHERE clause unless you really mean "every row"), and sql delete removes rows (with WHERE, or via TRUNCATE for a fast whole-table reset). Together they form sql crud — the dml in sql vocabulary every sql interview questions panel tests, and every data engineering interview questions loop circles back to when the conversation turns from SELECT to data movement.
This guide walks through every clause in the sql data manipulation family that reviewers love to test in data engineering interview questions: the safe-CRUD mental model, insert into sql patterns (single-row, multi-row, INSERT … SELECT), the update statement sql family with UPDATE … FROM JOIN for cross-table updates, delete from sql vs TRUNCATE TABLE vs DROP TABLE (the three operators that look similar but behave nothing alike), upsert sql and merge sql with ON CONFLICT / MERGE INTO, transaction patterns (BEGIN / COMMIT / ROLLBACK), and the seven gotchas (missing WHERE clauses, fan-out from joins, lock escalation) that fail most candidates. Every section ends as sql interview questions with answers: a runnable PostgreSQL query, a traced execution, an output table, and a concept-by-concept why this works breakdown — the exact shape sql for data engineers rounds reward when crud in sql comes up.
When you want hands-on reps immediately after reading, browse data-manipulation SQL practice →, drill the SQL practice lane →, sharpen ETL SQL drills →, rehearse database problems →, or widen coverage on the full SQL practice library →.
On this page
- Why INSERT, UPDATE, DELETE matter in data engineering interviews
- INSERT — single-row, multi-row, and INSERT INTO SELECT
- UPDATE — WHERE-clause discipline and cross-table UPDATE FROM JOIN
- DELETE vs TRUNCATE vs DROP — three operators that look alike
- UPSERT, MERGE, and ON CONFLICT — the idempotent write pattern
- Transactions, rollback, and write safety
- DML interview gotchas — missing WHERE, fan-out, lock escalation
- Choosing the right DML operator (cheat sheet)
- Frequently asked questions
- Practice on PipeCode
1. Why INSERT, UPDATE, DELETE matter in data engineering interviews
Three verbs, one safety contract — that's the whole CRUD interview surface
The one-sentence invariant interviewers chase: sql insert adds rows, sql update changes rows in place, and sql delete removes rows; every one of them needs explicit row-targeting (VALUES, WHERE, or ON CONFLICT) or the statement silently touches the entire table — and that's the bug interviewers love to test. Once you internalise the three verbs plus the safety contract, every prompt in the crud sql / crud in sql cluster becomes "pick the right verb and prove your WHERE clause."
The three DML verbs at a glance.
-
INSERT INTO table (cols) VALUES (…)— adds new rows; multi-row formVALUES (…), (…), (…);INSERT INTO … SELECTingests from another query (sql insert into selectis the most-tested ETL shape). -
UPDATE table SET col = expr WHERE …— changes existing rows that match theWHEREclause;update statement sqlwithout aWHEREupdates every row. -
DELETE FROM table WHERE …— removes rows that match theWHEREclause;delete from sqlwithout aWHEREempties the table (and runs row-by-row, unlikeTRUNCATE).
The "safety contract" every CRUD statement must honour.
-
Always have a row-targeting clause —
WHEREforUPDATE/DELETE;VALUESorSELECTforINSERT. -
Wrap risky CRUD in a transaction —
BEGIN; UPDATE …; SELECT *; ROLLBACKlets you preview the change before committing. -
Test with
SELECTfirst —SELECT * FROM t WHERE <predicate>shows exactly which rows would be touched. -
Use
RETURNING(Postgres / SQL ServerOUTPUT) to get back the affected rows in the same statement.
Why interviewers love DML.
- Every data engineering pipeline writes data — INSERT for ingest, UPDATE for slowly-changing dimensions, DELETE for soft-delete cascades, MERGE for upserts.
-
The
where clause deletediscipline separates juniors from seniors — every senior has a "I once forgot theWHERE" story. -
Dialect divergence is large —
INSERT … ON CONFLICT(Postgres) vsINSERT … ON DUPLICATE KEY UPDATE(MySQL) vsMERGE INTO(SQL Server / Oracle / Snowflake) all do the same job with different syntax. -
Transaction semantics expose understanding of ACID —
BEGIN/COMMIT/ROLLBACKare basic-but-tested.
What interviewers listen for.
- Do you state the safety contract aloud before writing the UPDATE / DELETE? — "this needs a
WHERE order_id = 42to limit to that single row." - Do you wrap risky changes in a transaction? —
BEGIN; … ROLLBACKfor preview,COMMITfor commit. - Do you reach for
INSERT … ON CONFLICT(or the dialect equivalent) for idempotent ingest? — senior signal. - Do you mention
TRUNCATEvsDELETEperformance for full-table resets? — bonus points.
Worked example — INSERT + UPDATE + DELETE in one transactional batch
Detailed explanation. Real pipelines bundle multiple DML statements into a single transaction so they all commit (or all roll back) atomically. The canonical shape: ingest new rows, update changed rows, delete obsolete rows, all between BEGIN and COMMIT.
Question. From a staging table staging_orders(order_id, status, amount), apply the changes to a live_orders table: insert new orders, update existing orders whose status changed, and delete orders flagged as 'cancel'. Use a single transaction.
Input. staging_orders slice plus live_orders slice.
| staging.order_id | staging.status | staging.amount |
|---|---|---|
| 1 | paid | 100 |
| 2 | cancel | 80 |
| 3 | paid | 150 |
| live.order_id | live.status | live.amount |
|---|---|---|
| 1 | pending | 100 |
| 2 | pending | 80 |
Code (PostgreSQL).
BEGIN;
INSERT INTO live_orders (order_id, status, amount)
SELECT s.order_id, s.status, s.amount
FROM staging_orders s
LEFT JOIN live_orders l ON l.order_id = s.order_id
WHERE l.order_id IS NULL
AND s.status <> 'cancel';
UPDATE live_orders l
SET status = s.status,
amount = s.amount
FROM staging_orders s
WHERE l.order_id = s.order_id
AND s.status <> 'cancel'
AND (l.status <> s.status OR l.amount <> s.amount);
DELETE FROM live_orders l
USING staging_orders s
WHERE l.order_id = s.order_id
AND s.status = 'cancel';
COMMIT;
Step-by-step explanation.
-
BEGINopens a transaction — every following statement is part of the same atomic batch. -
INSERT INTO … SELECTadds new orders (left-anti join finds rows in staging that don't exist in live). -
UPDATE … FROM staging_ordersmutates existing rows whose status or amount changed. -
DELETE … USING staging_ordersremoves live rows that staging marks as cancelled. -
COMMITmakes the whole batch durable;ROLLBACKwould undo every statement.
Output (after commit).
| order_id | status | amount |
|---|---|---|
| 1 | paid | 100 |
| 3 | paid | 150 |
Rule of thumb: bundle related DML in a transaction; preview with SELECT before each statement; never skip the WHERE.
SQL
Topic — data-manipulation
SQL CRUD drills
SQL
Topic — etl
ETL SQL practice
2. INSERT — single-row, multi-row, and INSERT INTO SELECT
sql insert and insert into sql — three shapes that cover 95% of inserts
The sql insert family has three shapes that every reviewer expects you to know: single-row inserts for tests and seed data, multi-row inserts for batches, and INSERT INTO … SELECT for ingest from another table or query. Every dialect supports all three; the syntax is almost identical across PostgreSQL, MySQL, SQL Server, Oracle, and Snowflake.
Shape 1 — single-row INSERT INTO … VALUES (…).
-
Signature —
INSERT INTO <table> (<col1>, <col2>, …) VALUES (<val1>, <val2>, …). -
insert into sqlwith explicit column list is the canonical interview shape; skipping the column list is a code smell. -
Default values — columns with
DEFAULTconstraints are filled automatically when omitted from the column list. -
SERIAL/IDENTITY/AUTO_INCREMENT— sequence-backed columns auto-generate values; omit them from the column list.
Shape 2 — multi-row INSERT INTO … VALUES (…), (…), (…).
-
One statement, many rows —
INSERT INTO t (a, b) VALUES (1, 'x'), (2, 'y'), (3, 'z'). - Atomic — all rows insert or none do (one transactional unit).
- Faster than N single-row inserts by 10-100× — the planner builds one execution plan, not N.
-
MySQL caveat — the
max_allowed_packetserver setting caps the statement length; very large multi-row inserts need to be batched.
Shape 3 — INSERT INTO … SELECT (the ETL workhorse).
-
Signature —
INSERT INTO <target> (<cols>) SELECT <expr_list> FROM <source> WHERE <predicate>. -
sql insert into selectis what every staging-to-live ingest in a warehouse looks like. -
Column count and types must match between the
SELECTprojection and the target table's column list. - Can pull from any query — JOINs, CTEs, set operations, subqueries; the source need not be a base table.
bulk insert sql — beyond multi-row VALUES.
-
PostgreSQL
COPY FROM— the fastest bulk-load operator; reads from a file orSTDIN, bypasses the WAL formatter overhead. -
MySQL
LOAD DATA INFILE— equivalent for CSV ingestion. -
SQL Server
BULK INSERT/bcp— the SQL Server bulk loader. -
Snowflake
COPY INTO— pulls from S3 / GCS / Azure Blob. -
Use them when ingesting > 100k rows from a file;
INSERT … VALUESis too slow at scale.
INSERT … RETURNING — get back what you wrote.
-
PostgreSQL / Oracle —
INSERT INTO t (a) VALUES (1) RETURNING idreturns the generated id without a follow-upSELECT. -
SQL Server —
INSERT … OUTPUT INSERTED.*does the same. - Use case — capturing auto-generated primary keys; chaining inserts that need the new id; loading without a second round-trip.
Common INSERT pitfalls.
-
No column list —
INSERT INTO t VALUES (…)assumes positional matching; one schema change breaks every caller. -
NULL into NOT NULL — raises an error;
INSERT … VALUES (NULL, …)into aNOT NULLcolumn fails loudly. -
Type mismatches — explicit cast or
TRY_CASTupstream avoids the silent-coercion bugs (see Blog78). -
Forgotten
RETURNING— clients that need the new id make two round-trips instead of one.
SQL interview question — INSERT INTO … SELECT a deduplicated staging slice
Assume staging_users(email, name, created_at) may contain duplicate emails. Load distinct users into users(email, name, created_at) using INSERT INTO … SELECT.
Solution Using INSERT INTO … SELECT + DISTINCT ON
Detailed explanation. The standard ETL shape: read from staging, dedupe to one row per natural key, project the right columns, insert into the live table. DISTINCT ON (Postgres) picks one row per email; the equivalent in portable SQL is ROW_NUMBER() = 1 inside a CTE.
Code (PostgreSQL).
INSERT INTO users (email, name, created_at)
SELECT DISTINCT ON (email)
email, name, created_at
FROM staging_users
ORDER BY email, created_at;
Equivalent in portable SQL.
INSERT INTO users (email, name, created_at)
SELECT email, name, created_at
FROM (
SELECT email, name, created_at,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at) AS rn
FROM staging_users
) ranked
WHERE rn = 1;
Step-by-step trace.
| step | relation | outcome |
|---|---|---|
| 1 | Scan staging_users
|
every staging row read |
| 2 | DISTINCT ON (email) ... ORDER BY email, created_at |
first row per email |
| 3 | Project (email, name, created_at)
|
three columns |
| 4 | INSERT INTO users |
dedup'd rows land in live |
Output: users now has one row per unique email from staging.
Why this works — concept by concept:
- INSERT INTO … SELECT — the canonical ETL shape; reads from one relation and writes to another in a single statement.
-
DISTINCT ON (email) — Postgres-specific; pairs with
ORDER BY email, created_atto pick the earliest row per email. - ROW_NUMBER() = 1 — portable alternative; works on every dialect via window functions.
-
No PRIMARY KEY violation — the dedupe guarantees one row per
email; ifemailis the PK, raw insert without dedupe would error. -
Cost —
Θ(n)scan +Θ(n log n)sort for the dedupe; the insert itself isΘ(distinct(n)).
SQL
Topic — data-manipulation
INSERT SQL drills
SQL
Topic — etl
INSERT + ETL patterns
3. UPDATE — WHERE-clause discipline and cross-table UPDATE FROM JOIN
sql update and update statement sql — never run without WHERE
sql update changes existing rows in place. The single most-asked sql interview questions prompt in this space is "what happens if you forget the WHERE clause?" — the answer is "every row in the table gets updated, silently." This is the update statement sql safety lesson every senior data engineer has learned the hard way.
The UPDATE signature.
-
Single-table —
UPDATE <table> SET <col1> = <expr1>, <col2> = <expr2> WHERE <predicate>. -
The
WHEREclause is not optional in practice — without it, every row is updated. -
SETaccepts any expression —SET amount = amount * 1.1(10% increase);SET status = 'paid' WHERE …. -
RETURNING(Postgres / Oracle) —UPDATE … RETURNING id, old_status, new_statusreturns affected rows.
UPDATE … FROM JOIN — cross-table update (the senior pattern).
-
PostgreSQL —
UPDATE target t SET col = s.val FROM source s WHERE t.id = s.id. -
SQL Server —
UPDATE target SET col = s.val FROM target t JOIN source s ON t.id = s.id. -
MySQL —
UPDATE target t JOIN source s ON t.id = s.id SET t.col = s.val. -
Oracle / Snowflake — use
MERGE INTO … USING …instead.
The WHERE clause patterns.
-
Single-row update —
WHERE id = 42(the safest predicate; touches at most one row). -
Bounded set —
WHERE id IN (1, 2, 3)orWHERE status = 'pending'. -
Cross-table —
WHERE t.id = s.idin the JOIN form. -
Conditional —
SET status = CASE WHEN amount > 100 THEN 'large' ELSE 'small' END WHERE ….
sql update multiple columns — set many fields in one statement.
-
One
SETclause, multiple assignments —SET col1 = val1, col2 = val2, col3 = val3 WHERE id = 42. -
Use a sub-select for many fields —
SET (col1, col2) = (SELECT a, b FROM other WHERE …)in Postgres / Oracle. -
MySQL row constructor —
SET (col1, col2) = (SELECT a, b FROM other)since 8.0+.
UPDATE with a subquery (the lookup pattern).
UPDATE orders
SET region = (SELECT r.region
FROM customers c
JOIN regions r ON c.region_id = r.id
WHERE c.customer_id = orders.customer_id)
WHERE region IS NULL;
- Per-row correlated subquery — runs once per matched row.
- Pre-aggregate with a CTE for performance when the subquery is expensive.
-
Watch for NULL — if the subquery returns no row, the column is set to
NULL; guard withWHERE EXISTS.
Common UPDATE pitfalls.
-
Missing
WHERE— the canonical interview trap; updates every row in the table. -
Implicit type coercion —
SET int_col = '42'triggers per-row casting; useCAST(see Blog78). -
Triggering chain reactions —
BEFORE UPDATEtriggers fire on every row; can compound cost. - Updating a primary key — usually disallowed by foreign-key constraints; if needed, delete and reinsert.
SQL interview question — UPDATE FROM JOIN to backfill region from customer
Assume orders(order_id, customer_id, region) where region is sometimes NULL, and customers(customer_id, region) always has the correct value. Backfill orders.region from customers.region for the rows where orders.region IS NULL.
Solution Using UPDATE … FROM JOIN
Detailed explanation. Cross-table UPDATE is the textbook backfill shape. Postgres uses UPDATE … FROM; SQL Server uses UPDATE … FROM target JOIN source; MySQL uses UPDATE target JOIN source. The semantics are identical; the syntax differs.
Code (PostgreSQL).
UPDATE orders o
SET region = c.region
FROM customers c
WHERE o.customer_id = c.customer_id
AND o.region IS NULL;
Equivalent in MySQL.
UPDATE orders o
JOIN customers c ON o.customer_id = c.customer_id
SET o.region = c.region
WHERE o.region IS NULL;
Equivalent in SQL Server.
UPDATE o
SET region = c.region
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.region IS NULL;
Step-by-step trace.
| step | relation | outcome |
|---|---|---|
| 1 | Scan orders
|
all order rows |
| 2 | Hash JOIN with customers on customer_id
|
matching customer per order |
| 3 | WHERE o.region IS NULL |
only rows needing a backfill |
| 4 | SET region = c.region |
mutate the target column |
| 5 | Commit (implicit if no transaction wraps it) | changes durable |
Output: every NULL region in orders is replaced with the customer's actual region.
Why this works — concept by concept:
- UPDATE … FROM customers — Postgres's syntax for "update target using a joined source".
- WHERE o.region IS NULL — the safety predicate; only touches rows that need the backfill.
-
Hash join under the hood — the planner picks an efficient join strategy;
Θ(n + m)for the lookup. - No triggers fire on unchanged rows — the predicate keeps the change set minimal.
-
Wrap in a transaction for preview —
BEGIN; UPDATE …; SELECT * FROM orders WHERE region IS NULL; ROLLBACK(orCOMMIT). -
Cost — single scan of
orders+ hash probe intocustomers; the update isΘ(matched_rows).
SQL
Topic — data-manipulation
UPDATE SQL drills
SQL
Topic — joins
JOIN + UPDATE patterns
4. DELETE vs TRUNCATE vs DROP — three operators that look alike
sql delete, TRUNCATE, and DROP — three operators, three very different behaviours
sql delete removes rows that match a WHERE predicate. TRUNCATE TABLE removes every row in a fast, minimally-logged operation. DROP TABLE removes the table itself (rows AND schema). They look similar at first glance; their behaviour is wildly different — and the delete vs truncate vs drop comparison is one of the most-asked sql interview questions in the DML cluster.
DELETE FROM … WHERE … — row-by-row removal.
-
Signature —
DELETE FROM <table> WHERE <predicate>. - Per-row WAL logging — every deleted row is logged for rollback; expensive on large tables.
-
Triggers fire —
BEFORE DELETEandAFTER DELETEtriggers run per row. -
Foreign keys checked —
ON DELETE CASCADE/RESTRICTrules apply. -
Rollback supported —
DELETEinside a transaction can be rolled back. - Identity sequence not reset — the next inserted row gets the next sequence value.
-
delete from sqlwithoutWHERE— empties the table row-by-row (slow); never the right tool for full-table reset.
TRUNCATE TABLE — fast whole-table empty.
-
Signature —
TRUNCATE TABLE <table>(noWHEREclause possible). - Minimal logging — most engines log the page-level deallocation, not per-row.
-
Faster than
DELETEby 10-1000× on large tables. - Triggers do NOT fire — most engines skip row-level triggers.
-
Identity sequence reset (dialect-dependent) — Postgres
TRUNCATE … RESTART IDENTITY; SQL Server resets by default; MySQL resets by default. -
truncate vs delete— the textbook interview comparison; pick TRUNCATE for full-table reset, DELETE for predicate-based removal. - Cannot be rolled back in MySQL InnoDB — implicit commit; Postgres and SQL Server support transactional truncate.
DROP TABLE — remove the table itself.
-
Signature —
DROP TABLE <table>. - Removes rows, indexes, constraints, triggers, AND the table definition.
-
Irreversible — restoring requires a backup or
CREATE TABLEfrom a saved script. -
Foreign keys —
DROP TABLE … CASCADEremoves dependent objects too. - Use case — temporary tables, dev/test cleanup, deprecated schemas.
delete vs truncate vs drop — the decision table.
| Operator | Removes rows? | Removes schema? | Per-row triggers? | WAL cost | Rollback (in tx)? | Sequence reset? |
|---|---|---|---|---|---|---|
DELETE |
Yes (per WHERE) | No | Yes | per row (high) | Yes | No |
TRUNCATE |
Yes (all) | No | No (most engines) | minimal | Yes (PG/SQL Server) | Yes (most engines) |
DROP |
Yes (all) | Yes | n/a | minimal | Yes (PG, SQL Server) | n/a |
delete from sql with where — the safe shape.
- Always start with
SELECT * FROM t WHERE <predicate>to confirm the row set. - Then change
SELECT *toDELETEkeeping the sameWHERE. - Wrap in
BEGIN; DELETE …; SELECT COUNT(*) FROM t WHERE …; ROLLBACKto preview. -
RETURNING *(Postgres) shows exactly which rows were deleted.
DELETE with JOIN (the related-tables shape).
-- PostgreSQL: DELETE … USING
DELETE FROM orders o
USING customers c
WHERE o.customer_id = c.customer_id
AND c.status = 'closed';
-- MySQL: DELETE alias FROM target JOIN source
DELETE o
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.status = 'closed';
-- SQL Server: DELETE … FROM
DELETE o
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.status = 'closed';
SQL interview question — purge orders for closed customers
Assume orders(order_id, customer_id, status) and customers(customer_id, status). Delete every order whose customer's status is 'closed'. Use a transaction.
Solution Using DELETE … USING + transaction
Detailed explanation. Wrap the destructive operation in a transaction so a preview is safe. DELETE … USING (Postgres) is the multi-table delete shape; SQL Server uses DELETE … FROM; MySQL uses DELETE alias FROM … JOIN ….
Code (PostgreSQL).
BEGIN;
-- preview first
SELECT o.order_id, o.customer_id, c.status
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.status = 'closed';
DELETE FROM orders o
USING customers c
WHERE o.customer_id = c.customer_id
AND c.status = 'closed'
RETURNING o.order_id, o.customer_id;
COMMIT;
Step-by-step trace.
| step | relation | outcome |
|---|---|---|
| 1 | BEGIN |
open transaction |
| 2 | Preview SELECT
|
confirm the rows that would be deleted |
| 3 |
DELETE … USING joins to customers
|
identify orders whose customer is closed |
| 4 | RETURNING o.order_id, o.customer_id |
capture deleted-row identifiers |
| 5 | COMMIT |
changes durable |
Output: the RETURNING clause emits one row per deleted order.
Why this works — concept by concept:
-
BEGIN / COMMIT wraps the destructive op — a
ROLLBACKmid-transaction undoes the delete without permanent damage. - Preview SELECT first — confirms the predicate matches the expected rows.
-
DELETE … USING — Postgres's multi-table delete syntax; same idea as MySQL's
DELETE a FROM a JOIN b. - RETURNING captures deleted IDs — useful for downstream logging or compensating actions.
-
No CASCADE concerns — the delete is on
ordersonly;customersrows are untouched. -
Cost — single scan + hash probe; deletes
Θ(matched_rows)rows.
SQL
Topic — data-manipulation
DELETE SQL drills
SQL
Topic — database
Database SQL library
5. UPSERT, MERGE, and ON CONFLICT — the idempotent write pattern
upsert sql and merge sql — insert if new, update if exists
upsert sql is the idempotent write pattern: insert the row if it doesn't exist, update it if it does. Every dialect spells this differently — INSERT … ON CONFLICT DO UPDATE (Postgres), INSERT … ON DUPLICATE KEY UPDATE (MySQL), MERGE INTO (SQL Server / Oracle / Snowflake / BigQuery) — but the semantics are identical. The merge into sql server / insert on conflict postgres family is the second-most-asked dml in sql interview topic after delete vs truncate.
PostgreSQL INSERT … ON CONFLICT DO UPDATE — the most readable.
INSERT INTO users (email, name, last_login)
VALUES ('alice@example.com', 'Alice', NOW())
ON CONFLICT (email) DO UPDATE
SET name = EXCLUDED.name,
last_login = EXCLUDED.last_login;
-
ON CONFLICT (col)— names the unique key that triggers conflict resolution. -
DO UPDATE SET …— the update clause;EXCLUDED.colreferences the values from the attempted insert. -
DO NOTHING— skip the row instead of updating; useful for idempotent inserts. -
insert on conflict postgresis the most-searched form of this pattern.
MySQL INSERT … ON DUPLICATE KEY UPDATE.
INSERT INTO users (email, name, last_login)
VALUES ('alice@example.com', 'Alice', NOW())
ON DUPLICATE KEY UPDATE
name = VALUES(name),
last_login = VALUES(last_login);
-
ON DUPLICATE KEY— fires when any unique key would be violated. -
VALUES(col)— references the value from the attempted insert (deprecated in MySQL 8.0.20+; replaced byAS new SET col = new.col). -
Modern MySQL —
INSERT INTO users (…) VALUES (…) AS new ON DUPLICATE KEY UPDATE col = new.col.
SQL Server / Oracle / Snowflake MERGE INTO.
MERGE INTO users AS target
USING (VALUES ('alice@example.com', 'Alice', CURRENT_TIMESTAMP)) AS src(email, name, last_login)
ON target.email = src.email
WHEN MATCHED THEN
UPDATE SET name = src.name, last_login = src.last_login
WHEN NOT MATCHED THEN
INSERT (email, name, last_login) VALUES (src.email, src.name, src.last_login);
-
MERGE INTO target USING source ON predicate— ANSI-standard merge. -
WHEN MATCHED THEN UPDATE— runs when the predicate matches. -
WHEN NOT MATCHED THEN INSERT— runs when the predicate doesn't match. -
WHEN NOT MATCHED BY SOURCE THEN DELETE(SQL Server) — full three-way merge. -
merge into sql serverhas been GA since SQL Server 2008; PostgreSQL addedMERGEin 15.
Snowflake / BigQuery MERGE.
-
Snowflake — uses the same
MERGE INTOsyntax as SQL Server; very common in cloud warehouse interviews. -
BigQuery —
MERGEwith the same shape; used heavily in slowly-changing-dimension (SCD) Type 2 pipelines.
The use cases that demand UPSERT.
- Idempotent ingest — re-running a pipeline must not double-insert.
-
Slowly-changing dimensions — customer dimension updates without losing history (combine
MERGEwith effective-date columns). - Event aggregation — running totals that update if the (entity, day) key exists, insert otherwise.
- CDC / change-data-capture — apply inserts, updates, and deletes from a source change feed.
When to reach for which.
-
PostgreSQL primary stack —
INSERT … ON CONFLICT DO UPDATE; most readable, native, and well-supported. -
MySQL primary stack —
INSERT … ON DUPLICATE KEY UPDATE; equally native. -
SQL Server / Oracle / Snowflake / BigQuery —
MERGE INTO; the only option, also the most powerful (three-way matched / not-matched / not-matched-by-source). -
PostgreSQL 15+ — also has
MERGE; pickON CONFLICTfor upserts andMERGEfor three-way logic.
SQL
Topic — data-manipulation
UPSERT / MERGE drills
SQL
Topic — etl
ETL + UPSERT patterns
6. Transactions, rollback, and write safety
BEGIN, COMMIT, ROLLBACK — the safety net every CRUD statement deserves
Every destructive sql update or sql delete belongs inside a transaction. The pattern is the same on every dialect: open the transaction, run the DML, inspect the result, then commit or roll back. The cost is one extra line of SQL; the upside is that you can preview and undo any mistake before it becomes permanent.
The canonical preview-and-commit pattern.
BEGIN;
-- preview the change
SELECT * FROM orders WHERE region = 'unknown';
-- do the change
UPDATE orders SET region = 'EU' WHERE region = 'unknown';
-- inspect again
SELECT * FROM orders WHERE region = 'unknown'; -- should be 0 rows
SELECT * FROM orders WHERE region = 'EU'; -- should include the updated rows
-- if happy:
COMMIT;
-- if not:
-- ROLLBACK;
Transaction primitives by dialect.
-
PostgreSQL —
BEGIN; … COMMIT;orBEGIN; … ROLLBACK;; alsoSAVEPOINTandROLLBACK TO SAVEPOINT. -
MySQL —
START TRANSACTION; … COMMIT;;SET autocommit = 0to make every statement transactional. -
SQL Server —
BEGIN TRANSACTION; … COMMIT TRANSACTION;orROLLBACK TRANSACTION. -
Oracle —
SET TRANSACTION READ WRITE; … COMMIT;; implicit transaction starts on first DML. -
Snowflake —
BEGIN; … COMMIT;; full ACID semantics.
ACID properties — what transactions guarantee.
- Atomicity — all DML in the transaction commits together, or none does.
-
Consistency — constraints (
NOT NULL, foreign keys, unique) are checked at commit time. - Isolation — concurrent transactions don't see each other's uncommitted changes (depends on isolation level).
- Durability — once committed, the change survives a crash (WAL / redo log).
Isolation levels — the cheat-bullet.
- READ UNCOMMITTED — sees uncommitted writes from others ("dirty reads"); rarely used.
- READ COMMITTED — sees only committed writes; the default in Postgres, SQL Server, Oracle.
- REPEATABLE READ — repeated reads in the same transaction return the same result; MySQL default.
- SERIALIZABLE — strongest; transactions appear to execute one at a time; sometimes via retry.
ROLLBACK — undo without commit.
-
Inside a transaction —
ROLLBACKdiscards every change sinceBEGIN. -
SAVEPOINT s1+ROLLBACK TO s1— partial rollback to a named point inside a transaction. - Implicit rollback — on session disconnect, network error, server crash; uncommitted changes are discarded.
-
The senior signal — knowing that
ROLLBACKis your friend and using it freely.
The cost of long-running transactions.
-
Lock duration — a long
UPDATEholds row-level locks until commit; concurrent writers wait. - WAL / undo log size — large transactions inflate the log; vacuum / log truncation depends on commit.
-
Snapshot bloat (Postgres) — MVCC snapshots accumulate; long transactions block
VACUUMfrom reclaiming space. - Production discipline — break large batches into smaller transactions; commit every 1k–10k rows.
RETURNING / OUTPUT — pair with transactions.
-
UPDATE … RETURNING …(Postgres) — captures changed rows inside the transaction. -
OUTPUT INSERTED.*, DELETED.*(SQL Server) — captures both old and new values; perfect for audit logs. - Use case — write an audit-trail INSERT in the same transaction that captures the UPDATE/DELETE side-effects.
SQL
Topic — database
Transactions SQL drills
SQL
Topic — data-manipulation
Transactional CRUD patterns
7. DML interview gotchas — missing WHERE, fan-out, lock escalation
The seven bugs interviewers test most often on CRUD
DML has a small surface area but a long tail of edge cases that fail candidates. These are the seven gotchas reviewers test most often when crud sql comes up in an interview.
Gotcha 1 — UPDATE / DELETE without WHERE.
-
The bug —
UPDATE orders SET status = 'paid'updates every row inorders. - Symptom — production-data corruption; "I just lost an afternoon" engineer slack messages.
-
Fix — always preview with
SELECT * FROM t WHERE <predicate>first; wrap in a transaction. -
The discipline — type
WHEREbeforeSETwhen writingUPDATEfrom scratch.
Gotcha 2 — fan-out from UPDATE … FROM JOIN.
- The bug — joining to a source table with multiple matching rows updates the target row N times with the last match (engine-dependent).
- Symptom — non-deterministic results; "the update sometimes does the right thing".
- Fix — pre-aggregate the source to one row per join key inside a CTE.
Gotcha 3 — DELETE cascade chains too far.
-
The bug —
ON DELETE CASCADEforeign keys propagate deletes through unrelated tables. - Symptom — deleting one customer wipes out their orders, their payments, their reviews — none of which the engineer intended.
-
Fix — confirm the foreign-key topology; use
RESTRICTinstead ofCASCADEfor safety; soft-delete via adeleted_atcolumn instead.
Gotcha 4 — TRUNCATE cannot be rolled back in MySQL.
-
The behaviour — MySQL InnoDB
TRUNCATEdoes an implicit commit; once issued, noROLLBACKrecovers the data. -
Symptom — "I wrapped it in
START TRANSACTIONand it still committed". -
Fix — use
DELETE FROM tinside a transaction for previewable removal; or take a backup beforeTRUNCATE.
Gotcha 5 — INSERT … SELECT without dedup violates UNIQUE.
- The bug — staging table has duplicates; insert into a table with a UNIQUE constraint errors on the second occurrence.
-
Symptom —
ERROR: duplicate key value violates unique constraint. -
Fix — dedup with
DISTINCT ON/ROW_NUMBER() = 1(Postgres) orINSERT … ON CONFLICT DO NOTHINGfor an idempotent ingest.
Gotcha 6 — implicit type coercion on the SET side of UPDATE.
-
The bug —
SET int_col = '42'silently parses the string; if the string is'abc'the update errors mid-batch and leaves the table partially updated (if no transaction wraps it). -
Fix — explicit
CAST(see Blog78); validate upstream; wrap in a transaction so partial application can roll back.
Gotcha 7 — lock escalation on large UPDATE / DELETE.
- The behaviour — many row-level locks get promoted to table-level locks above an engine-specific threshold; concurrent writers block until the operation completes.
- Symptom — production app freezes for the duration of a multi-million-row UPDATE.
-
Fix — batch the change in chunks (
WHERE id BETWEEN 1 AND 10000,LIMIT 10000, etc.) and commit between chunks.
SQL
Topic — data-manipulation
DML gotcha drills
SQL
Topic — database
Database safety drills
Choosing the right DML operator (cheat sheet)
A one-screen cheat sheet for using SQL INSERT, UPDATE, DELETE — pick the verb that matches your intent, then the dialect-specific syntax.
| You want to … | Operator | Notes |
|---|---|---|
| Add new rows from literals | INSERT INTO t (cols) VALUES (…) |
Always include the column list |
| Add many rows in one statement | INSERT INTO t (cols) VALUES (…), (…), (…) |
Atomic; 10-100× faster than N inserts |
| Add rows from another table | INSERT INTO t (cols) SELECT … FROM s |
The ETL workhorse |
| Bulk-load from a file |
COPY (PG) / LOAD DATA INFILE (MySQL) / BULK INSERT (SQL Server) |
For > 100k rows |
| Change existing rows | UPDATE t SET … WHERE … |
Always WHERE
|
| Update from another table |
UPDATE t SET col = s.val FROM s WHERE … (PG) |
UPDATE t JOIN s (MySQL); UPDATE t … FROM (SQL Server) |
| Remove specific rows | DELETE FROM t WHERE … |
Per-row WAL; supports rollback |
| Empty a whole table fast | TRUNCATE TABLE t |
Minimally logged; resets identity |
| Remove the table itself | DROP TABLE t |
Irreversible without backup |
| Insert if new, update if exists |
INSERT … ON CONFLICT DO UPDATE (PG) |
ON DUPLICATE KEY UPDATE (MySQL); MERGE INTO (SQL Server, Oracle, Snowflake) |
| Idempotent insert (skip duplicates) | INSERT … ON CONFLICT DO NOTHING |
Safe for re-runnable pipelines |
| Three-way merge (insert/update/delete) | MERGE INTO t USING s ON … |
SCD Type 2 pattern |
| Preview before commit | BEGIN; SELECT …; DML; SELECT …; ROLLBACK; |
The safety pattern |
| Capture affected rows |
… RETURNING * (PG/Oracle) / OUTPUT INSERTED.*, DELETED.* (SQL Server) |
One round-trip audit |
Frequently asked questions
What's the difference between DELETE, TRUNCATE, and DROP in SQL?
DELETE removes rows that match a WHERE predicate — it's row-by-row, fully logged, triggers fire, foreign keys are checked, and the change can be rolled back inside a transaction. TRUNCATE TABLE removes every row in a single fast operation — minimally logged, no WHERE allowed, no triggers fire on most engines, the identity sequence usually resets, and it's 10-1000× faster than DELETE on large tables (but cannot be rolled back in MySQL InnoDB). DROP TABLE removes the table itself — rows AND schema, indexes, constraints, triggers — and is irreversible without a backup. The canonical interview rule: DELETE for row-level removal with rollback safety; TRUNCATE for fast whole-table reset; DROP for deleting the table definition entirely.
How do I write an UPSERT in SQL?
Every dialect spells UPSERT differently. PostgreSQL — INSERT INTO t (cols) VALUES (…) ON CONFLICT (unique_col) DO UPDATE SET col = EXCLUDED.col. MySQL — INSERT INTO t (cols) VALUES (…) ON DUPLICATE KEY UPDATE col = VALUES(col) (or the modern … AS new ON DUPLICATE KEY UPDATE col = new.col since 8.0.20). SQL Server / Oracle / Snowflake / BigQuery / PostgreSQL 15+ — MERGE INTO t USING source ON match_predicate WHEN MATCHED THEN UPDATE SET … WHEN NOT MATCHED THEN INSERT (…) VALUES (…). The semantics are identical across all of them: insert if no row matches the unique key, update if a row already exists. UPSERT is the foundation of idempotent pipelines — re-running an ingest must not double-insert, and INSERT … ON CONFLICT / MERGE makes that a one-liner.
What happens if I run UPDATE or DELETE without a WHERE clause?
It updates or deletes every row in the table — silently, with no warning. This is the canonical sql interview questions trap and the source of more "I just nuked production" stories than any other DML mistake. The defence: never write UPDATE or DELETE from a blank starting point; type the WHERE clause first, then the SET / FROM. Wrap risky DML in a transaction (BEGIN; UPDATE …; SELECT *; ROLLBACK). Preview with SELECT * FROM t WHERE <predicate> to confirm the row set before changing the verb. Some engines (MySQL 8 with safe-updates mode) refuse to execute UPDATE / DELETE without a WHERE clause that includes a key column — enable this setting in development environments.
How do I update one table based on values in another table?
Use cross-table UPDATE — the dialect-specific syntax differs but the semantics are identical. PostgreSQL — UPDATE target t SET col = s.val FROM source s WHERE t.id = s.id. MySQL — UPDATE target t JOIN source s ON t.id = s.id SET t.col = s.val. SQL Server — UPDATE t SET col = s.val FROM target t JOIN source s ON t.id = s.id. Oracle / Snowflake — use MERGE INTO target USING source ON (…) WHEN MATCHED THEN UPDATE SET …. Always include a WHERE clause (or a MERGE join predicate) that scopes the update; pre-aggregate the source to one row per join key inside a CTE if the source can have duplicates, otherwise the update fans out non-deterministically.
Should I use INSERT INTO SELECT or COPY / LOAD DATA INFILE for bulk loading?
For under ~100k rows, INSERT INTO t (cols) SELECT … FROM staging (or multi-row INSERT … VALUES) is fine — the planner builds one execution plan and writes are batched. For larger loads, switch to the dialect's bulk loader: PostgreSQL COPY FROM (the fastest by far for file ingest, bypasses much of the WAL overhead), MySQL LOAD DATA INFILE, SQL Server BULK INSERT or bcp, Snowflake COPY INTO (from S3 / GCS / Azure), BigQuery's LOAD DATA or table-loader API. The bulk loaders skip per-row WAL formatting and trigger firing where possible — they trade some safety for raw throughput. Inside an ETL pipeline, the standard pattern is: bulk-load into a staging table, then run INSERT INTO live SELECT … FROM staging with explicit deduplication and constraint checks.
Practice on PipeCode
PipeCode ships 450+ data-engineering interview problems — including PostgreSQL-first SQL practice keyed to DML CRUD (INSERT, UPDATE, DELETE), UPSERT patterns with ON CONFLICT / MERGE, transactions, ETL ingest, and the dialect quirks that fail candidates who memorise only one engine. Whether you're drilling sql interview questions with answers for sql for data engineers loops or grinding through data engineering interview questions end-to-end, the practice library mirrors the same safety-first mental model this guide teaches.
Kick off via Explore practice →; drill the data-manipulation SQL lane →; fan out into the ETL SQL lane →; rehearse database problems →; reinforce JOIN + UPDATE patterns →; widen coverage on the full SQL practice library →.





Top comments (0)