DEV Community

Cover image for INSERT, UPDATE, DELETE in SQL: Safe CRUD Patterns for Data Engineers
Gowtham Potureddi
Gowtham Potureddi

Posted on

INSERT, UPDATE, DELETE in SQL: Safe CRUD Patterns for Data Engineers

sql insert, sql update, and sql delete are the three DML verbs that move every row in a warehouse: sql insert adds new rows (one row, many rows, or INSERT INTO … SELECT from another query), sql update changes existing rows in place (always paired with a WHERE clause unless you really mean "every row"), and sql delete removes rows (with WHERE, or via TRUNCATE for a fast whole-table reset). Together they form sql crud — the dml in sql vocabulary every sql interview questions panel tests, and every data engineering interview questions loop circles back to when the conversation turns from SELECT to data movement.

This guide walks through every clause in the sql data manipulation family that reviewers love to test in data engineering interview questions: the safe-CRUD mental model, insert into sql patterns (single-row, multi-row, INSERT … SELECT), the update statement sql family with UPDATE … FROM JOIN for cross-table updates, delete from sql vs TRUNCATE TABLE vs DROP TABLE (the three operators that look similar but behave nothing alike), upsert sql and merge sql with ON CONFLICT / MERGE INTO, transaction patterns (BEGIN / COMMIT / ROLLBACK), and the seven gotchas (missing WHERE clauses, fan-out from joins, lock escalation) that fail most candidates. Every section ends as sql interview questions with answers: a runnable PostgreSQL query, a traced execution, an output table, and a concept-by-concept why this works breakdown — the exact shape sql for data engineers rounds reward when crud in sql comes up.

PipeCode blog header for a SQL CRUD tutorial — bold white headline 'INSERT · UPDATE · DELETE' with subtitle 'safe CRUD patterns for data engineers' and a minimal code snippet on a dark gradient with purple, green, and orange accents and a small pipecode.ai attribution.

When you want hands-on reps immediately after reading, browse data-manipulation SQL practice →, drill the SQL practice lane →, sharpen ETL SQL drills →, rehearse database problems →, or widen coverage on the full SQL practice library →.


On this page


1. Why INSERT, UPDATE, DELETE matter in data engineering interviews

Three verbs, one safety contract — that's the whole CRUD interview surface

The one-sentence invariant interviewers chase: sql insert adds rows, sql update changes rows in place, and sql delete removes rows; every one of them needs explicit row-targeting (VALUES, WHERE, or ON CONFLICT) or the statement silently touches the entire table — and that's the bug interviewers love to test. Once you internalise the three verbs plus the safety contract, every prompt in the crud sql / crud in sql cluster becomes "pick the right verb and prove your WHERE clause."

The three DML verbs at a glance.

  • INSERT INTO table (cols) VALUES (…) — adds new rows; multi-row form VALUES (…), (…), (…); INSERT INTO … SELECT ingests from another query (sql insert into select is the most-tested ETL shape).
  • UPDATE table SET col = expr WHERE … — changes existing rows that match the WHERE clause; update statement sql without a WHERE updates every row.
  • DELETE FROM table WHERE … — removes rows that match the WHERE clause; delete from sql without a WHERE empties the table (and runs row-by-row, unlike TRUNCATE).

The "safety contract" every CRUD statement must honour.

  • Always have a row-targeting clauseWHERE for UPDATE / DELETE; VALUES or SELECT for INSERT.
  • Wrap risky CRUD in a transactionBEGIN; UPDATE …; SELECT *; ROLLBACK lets you preview the change before committing.
  • Test with SELECT firstSELECT * FROM t WHERE <predicate> shows exactly which rows would be touched.
  • Use RETURNING (Postgres / SQL Server OUTPUT) to get back the affected rows in the same statement.

Why interviewers love DML.

  • Every data engineering pipeline writes data — INSERT for ingest, UPDATE for slowly-changing dimensions, DELETE for soft-delete cascades, MERGE for upserts.
  • The where clause delete discipline separates juniors from seniors — every senior has a "I once forgot the WHERE" story.
  • Dialect divergence is largeINSERT … ON CONFLICT (Postgres) vs INSERT … ON DUPLICATE KEY UPDATE (MySQL) vs MERGE INTO (SQL Server / Oracle / Snowflake) all do the same job with different syntax.
  • Transaction semantics expose understanding of ACIDBEGIN / COMMIT / ROLLBACK are basic-but-tested.

What interviewers listen for.

  • Do you state the safety contract aloud before writing the UPDATE / DELETE? — "this needs a WHERE order_id = 42 to limit to that single row."
  • Do you wrap risky changes in a transaction? — BEGIN; … ROLLBACK for preview, COMMIT for commit.
  • Do you reach for INSERT … ON CONFLICT (or the dialect equivalent) for idempotent ingest? — senior signal.
  • Do you mention TRUNCATE vs DELETE performance for full-table resets? — bonus points.

Worked example — INSERT + UPDATE + DELETE in one transactional batch

Detailed explanation. Real pipelines bundle multiple DML statements into a single transaction so they all commit (or all roll back) atomically. The canonical shape: ingest new rows, update changed rows, delete obsolete rows, all between BEGIN and COMMIT.

Question. From a staging table staging_orders(order_id, status, amount), apply the changes to a live_orders table: insert new orders, update existing orders whose status changed, and delete orders flagged as 'cancel'. Use a single transaction.

Input. staging_orders slice plus live_orders slice.

staging.order_id staging.status staging.amount
1 paid 100
2 cancel 80
3 paid 150
live.order_id live.status live.amount
1 pending 100
2 pending 80

Code (PostgreSQL).

BEGIN;

INSERT INTO live_orders (order_id, status, amount)
SELECT s.order_id, s.status, s.amount
FROM staging_orders s
LEFT JOIN live_orders l ON l.order_id = s.order_id
WHERE l.order_id IS NULL
  AND s.status <> 'cancel';

UPDATE live_orders l
SET status = s.status,
    amount = s.amount
FROM staging_orders s
WHERE l.order_id = s.order_id
  AND s.status <> 'cancel'
  AND (l.status <> s.status OR l.amount <> s.amount);

DELETE FROM live_orders l
USING staging_orders s
WHERE l.order_id = s.order_id
  AND s.status = 'cancel';

COMMIT;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. BEGIN opens a transaction — every following statement is part of the same atomic batch.
  2. INSERT INTO … SELECT adds new orders (left-anti join finds rows in staging that don't exist in live).
  3. UPDATE … FROM staging_orders mutates existing rows whose status or amount changed.
  4. DELETE … USING staging_orders removes live rows that staging marks as cancelled.
  5. COMMIT makes the whole batch durable; ROLLBACK would undo every statement.

Output (after commit).

order_id status amount
1 paid 100
3 paid 150

Rule of thumb: bundle related DML in a transaction; preview with SELECT before each statement; never skip the WHERE.

SQL
Topic — data-manipulation
SQL CRUD drills

Practice →

SQL
Topic — etl
ETL SQL practice

Practice →


2. INSERT — single-row, multi-row, and INSERT INTO SELECT

Diagram of the INSERT family — three side-by-side cards showing single-row INSERT INTO … VALUES, multi-row INSERT with three tuples, and INSERT INTO … SELECT from another query, on a light PipeCode card.

sql insert and insert into sql — three shapes that cover 95% of inserts

The sql insert family has three shapes that every reviewer expects you to know: single-row inserts for tests and seed data, multi-row inserts for batches, and INSERT INTO … SELECT for ingest from another table or query. Every dialect supports all three; the syntax is almost identical across PostgreSQL, MySQL, SQL Server, Oracle, and Snowflake.

Shape 1 — single-row INSERT INTO … VALUES (…).

  • SignatureINSERT INTO <table> (<col1>, <col2>, …) VALUES (<val1>, <val2>, …).
  • insert into sql with explicit column list is the canonical interview shape; skipping the column list is a code smell.
  • Default values — columns with DEFAULT constraints are filled automatically when omitted from the column list.
  • SERIAL / IDENTITY / AUTO_INCREMENT — sequence-backed columns auto-generate values; omit them from the column list.

Shape 2 — multi-row INSERT INTO … VALUES (…), (…), (…).

  • One statement, many rowsINSERT INTO t (a, b) VALUES (1, 'x'), (2, 'y'), (3, 'z').
  • Atomic — all rows insert or none do (one transactional unit).
  • Faster than N single-row inserts by 10-100× — the planner builds one execution plan, not N.
  • MySQL caveat — the max_allowed_packet server setting caps the statement length; very large multi-row inserts need to be batched.

Shape 3 — INSERT INTO … SELECT (the ETL workhorse).

  • SignatureINSERT INTO <target> (<cols>) SELECT <expr_list> FROM <source> WHERE <predicate>.
  • sql insert into select is what every staging-to-live ingest in a warehouse looks like.
  • Column count and types must match between the SELECT projection and the target table's column list.
  • Can pull from any query — JOINs, CTEs, set operations, subqueries; the source need not be a base table.

bulk insert sql — beyond multi-row VALUES.

  • PostgreSQL COPY FROM — the fastest bulk-load operator; reads from a file or STDIN, bypasses the WAL formatter overhead.
  • MySQL LOAD DATA INFILE — equivalent for CSV ingestion.
  • SQL Server BULK INSERT / bcp — the SQL Server bulk loader.
  • Snowflake COPY INTO — pulls from S3 / GCS / Azure Blob.
  • Use them when ingesting > 100k rows from a file; INSERT … VALUES is too slow at scale.

INSERT … RETURNING — get back what you wrote.

  • PostgreSQL / OracleINSERT INTO t (a) VALUES (1) RETURNING id returns the generated id without a follow-up SELECT.
  • SQL ServerINSERT … OUTPUT INSERTED.* does the same.
  • Use case — capturing auto-generated primary keys; chaining inserts that need the new id; loading without a second round-trip.

Common INSERT pitfalls.

  • No column listINSERT INTO t VALUES (…) assumes positional matching; one schema change breaks every caller.
  • NULL into NOT NULL — raises an error; INSERT … VALUES (NULL, …) into a NOT NULL column fails loudly.
  • Type mismatches — explicit cast or TRY_CAST upstream avoids the silent-coercion bugs (see Blog78).
  • Forgotten RETURNING — clients that need the new id make two round-trips instead of one.

SQL interview question — INSERT INTO … SELECT a deduplicated staging slice

Assume staging_users(email, name, created_at) may contain duplicate emails. Load distinct users into users(email, name, created_at) using INSERT INTO … SELECT.

Solution Using INSERT INTO … SELECT + DISTINCT ON

Detailed explanation. The standard ETL shape: read from staging, dedupe to one row per natural key, project the right columns, insert into the live table. DISTINCT ON (Postgres) picks one row per email; the equivalent in portable SQL is ROW_NUMBER() = 1 inside a CTE.

Code (PostgreSQL).

INSERT INTO users (email, name, created_at)
SELECT DISTINCT ON (email)
       email, name, created_at
FROM staging_users
ORDER BY email, created_at;
Enter fullscreen mode Exit fullscreen mode

Equivalent in portable SQL.

INSERT INTO users (email, name, created_at)
SELECT email, name, created_at
FROM (
    SELECT email, name, created_at,
           ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at) AS rn
    FROM staging_users
) ranked
WHERE rn = 1;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

step relation outcome
1 Scan staging_users every staging row read
2 DISTINCT ON (email) ... ORDER BY email, created_at first row per email
3 Project (email, name, created_at) three columns
4 INSERT INTO users dedup'd rows land in live

Output: users now has one row per unique email from staging.

Why this works — concept by concept:

  • INSERT INTO … SELECT — the canonical ETL shape; reads from one relation and writes to another in a single statement.
  • DISTINCT ON (email) — Postgres-specific; pairs with ORDER BY email, created_at to pick the earliest row per email.
  • ROW_NUMBER() = 1 — portable alternative; works on every dialect via window functions.
  • No PRIMARY KEY violation — the dedupe guarantees one row per email; if email is the PK, raw insert without dedupe would error.
  • CostΘ(n) scan + Θ(n log n) sort for the dedupe; the insert itself is Θ(distinct(n)).

SQL
Topic — data-manipulation
INSERT SQL drills

Practice →

SQL
Topic — etl
INSERT + ETL patterns

Practice →


3. UPDATE — WHERE-clause discipline and cross-table UPDATE FROM JOIN

Diagram of UPDATE shapes — left card shows a simple UPDATE … SET … WHERE; right card shows an UPDATE … FROM joined_table … WHERE with the join condition highlighted, with arrows from the source table feeding the update target, on a light PipeCode card.

sql update and update statement sql — never run without WHERE

sql update changes existing rows in place. The single most-asked sql interview questions prompt in this space is "what happens if you forget the WHERE clause?" — the answer is "every row in the table gets updated, silently." This is the update statement sql safety lesson every senior data engineer has learned the hard way.

The UPDATE signature.

  • Single-tableUPDATE <table> SET <col1> = <expr1>, <col2> = <expr2> WHERE <predicate>.
  • The WHERE clause is not optional in practice — without it, every row is updated.
  • SET accepts any expressionSET amount = amount * 1.1 (10% increase); SET status = 'paid' WHERE ….
  • RETURNING (Postgres / Oracle)UPDATE … RETURNING id, old_status, new_status returns affected rows.

UPDATE … FROM JOIN — cross-table update (the senior pattern).

  • PostgreSQLUPDATE target t SET col = s.val FROM source s WHERE t.id = s.id.
  • SQL ServerUPDATE target SET col = s.val FROM target t JOIN source s ON t.id = s.id.
  • MySQLUPDATE target t JOIN source s ON t.id = s.id SET t.col = s.val.
  • Oracle / Snowflake — use MERGE INTO … USING … instead.

The WHERE clause patterns.

  • Single-row updateWHERE id = 42 (the safest predicate; touches at most one row).
  • Bounded setWHERE id IN (1, 2, 3) or WHERE status = 'pending'.
  • Cross-tableWHERE t.id = s.id in the JOIN form.
  • ConditionalSET status = CASE WHEN amount > 100 THEN 'large' ELSE 'small' END WHERE ….

sql update multiple columns — set many fields in one statement.

  • One SET clause, multiple assignmentsSET col1 = val1, col2 = val2, col3 = val3 WHERE id = 42.
  • Use a sub-select for many fieldsSET (col1, col2) = (SELECT a, b FROM other WHERE …) in Postgres / Oracle.
  • MySQL row constructorSET (col1, col2) = (SELECT a, b FROM other) since 8.0+.

UPDATE with a subquery (the lookup pattern).

UPDATE orders
SET region = (SELECT r.region
              FROM customers c
              JOIN regions r ON c.region_id = r.id
              WHERE c.customer_id = orders.customer_id)
WHERE region IS NULL;
Enter fullscreen mode Exit fullscreen mode
  • Per-row correlated subquery — runs once per matched row.
  • Pre-aggregate with a CTE for performance when the subquery is expensive.
  • Watch for NULL — if the subquery returns no row, the column is set to NULL; guard with WHERE EXISTS.

Common UPDATE pitfalls.

  • Missing WHERE — the canonical interview trap; updates every row in the table.
  • Implicit type coercionSET int_col = '42' triggers per-row casting; use CAST (see Blog78).
  • Triggering chain reactionsBEFORE UPDATE triggers fire on every row; can compound cost.
  • Updating a primary key — usually disallowed by foreign-key constraints; if needed, delete and reinsert.

SQL interview question — UPDATE FROM JOIN to backfill region from customer

Assume orders(order_id, customer_id, region) where region is sometimes NULL, and customers(customer_id, region) always has the correct value. Backfill orders.region from customers.region for the rows where orders.region IS NULL.

Solution Using UPDATE … FROM JOIN

Detailed explanation. Cross-table UPDATE is the textbook backfill shape. Postgres uses UPDATE … FROM; SQL Server uses UPDATE … FROM target JOIN source; MySQL uses UPDATE target JOIN source. The semantics are identical; the syntax differs.

Code (PostgreSQL).

UPDATE orders o
SET region = c.region
FROM customers c
WHERE o.customer_id = c.customer_id
  AND o.region IS NULL;
Enter fullscreen mode Exit fullscreen mode

Equivalent in MySQL.

UPDATE orders o
JOIN customers c ON o.customer_id = c.customer_id
SET o.region = c.region
WHERE o.region IS NULL;
Enter fullscreen mode Exit fullscreen mode

Equivalent in SQL Server.

UPDATE o
SET region = c.region
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.region IS NULL;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

step relation outcome
1 Scan orders all order rows
2 Hash JOIN with customers on customer_id matching customer per order
3 WHERE o.region IS NULL only rows needing a backfill
4 SET region = c.region mutate the target column
5 Commit (implicit if no transaction wraps it) changes durable

Output: every NULL region in orders is replaced with the customer's actual region.

Why this works — concept by concept:

  • UPDATE … FROM customers — Postgres's syntax for "update target using a joined source".
  • WHERE o.region IS NULL — the safety predicate; only touches rows that need the backfill.
  • Hash join under the hood — the planner picks an efficient join strategy; Θ(n + m) for the lookup.
  • No triggers fire on unchanged rows — the predicate keeps the change set minimal.
  • Wrap in a transaction for previewBEGIN; UPDATE …; SELECT * FROM orders WHERE region IS NULL; ROLLBACK (or COMMIT).
  • Cost — single scan of orders + hash probe into customers; the update is Θ(matched_rows).

SQL
Topic — data-manipulation
UPDATE SQL drills

Practice →

SQL
Topic — joins
JOIN + UPDATE patterns

Practice →


4. DELETE vs TRUNCATE vs DROP — three operators that look alike

Diagram comparing DELETE, TRUNCATE, and DROP — three side-by-side cards showing the behaviour of each (rows removed, table emptied, table removed), with annotations for WAL / logging cost, rollback support, and identity-sequence reset, on a light PipeCode card.

sql delete, TRUNCATE, and DROP — three operators, three very different behaviours

sql delete removes rows that match a WHERE predicate. TRUNCATE TABLE removes every row in a fast, minimally-logged operation. DROP TABLE removes the table itself (rows AND schema). They look similar at first glance; their behaviour is wildly different — and the delete vs truncate vs drop comparison is one of the most-asked sql interview questions in the DML cluster.

DELETE FROM … WHERE … — row-by-row removal.

  • SignatureDELETE FROM <table> WHERE <predicate>.
  • Per-row WAL logging — every deleted row is logged for rollback; expensive on large tables.
  • Triggers fireBEFORE DELETE and AFTER DELETE triggers run per row.
  • Foreign keys checkedON DELETE CASCADE / RESTRICT rules apply.
  • Rollback supportedDELETE inside a transaction can be rolled back.
  • Identity sequence not reset — the next inserted row gets the next sequence value.
  • delete from sql without WHERE — empties the table row-by-row (slow); never the right tool for full-table reset.

TRUNCATE TABLE — fast whole-table empty.

  • SignatureTRUNCATE TABLE <table> (no WHERE clause possible).
  • Minimal logging — most engines log the page-level deallocation, not per-row.
  • Faster than DELETE by 10-1000× on large tables.
  • Triggers do NOT fire — most engines skip row-level triggers.
  • Identity sequence reset (dialect-dependent) — Postgres TRUNCATE … RESTART IDENTITY; SQL Server resets by default; MySQL resets by default.
  • truncate vs delete — the textbook interview comparison; pick TRUNCATE for full-table reset, DELETE for predicate-based removal.
  • Cannot be rolled back in MySQL InnoDB — implicit commit; Postgres and SQL Server support transactional truncate.

DROP TABLE — remove the table itself.

  • SignatureDROP TABLE <table>.
  • Removes rows, indexes, constraints, triggers, AND the table definition.
  • Irreversible — restoring requires a backup or CREATE TABLE from a saved script.
  • Foreign keysDROP TABLE … CASCADE removes dependent objects too.
  • Use case — temporary tables, dev/test cleanup, deprecated schemas.

delete vs truncate vs drop — the decision table.

Operator Removes rows? Removes schema? Per-row triggers? WAL cost Rollback (in tx)? Sequence reset?
DELETE Yes (per WHERE) No Yes per row (high) Yes No
TRUNCATE Yes (all) No No (most engines) minimal Yes (PG/SQL Server) Yes (most engines)
DROP Yes (all) Yes n/a minimal Yes (PG, SQL Server) n/a

delete from sql with where — the safe shape.

  • Always start with SELECT * FROM t WHERE <predicate> to confirm the row set.
  • Then change SELECT * to DELETE keeping the same WHERE.
  • Wrap in BEGIN; DELETE …; SELECT COUNT(*) FROM t WHERE …; ROLLBACK to preview.
  • RETURNING * (Postgres) shows exactly which rows were deleted.

DELETE with JOIN (the related-tables shape).

-- PostgreSQL: DELETE … USING
DELETE FROM orders o
USING customers c
WHERE o.customer_id = c.customer_id
  AND c.status = 'closed';

-- MySQL: DELETE alias FROM target JOIN source
DELETE o
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.status = 'closed';

-- SQL Server: DELETE … FROM
DELETE o
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.status = 'closed';
Enter fullscreen mode Exit fullscreen mode

SQL interview question — purge orders for closed customers

Assume orders(order_id, customer_id, status) and customers(customer_id, status). Delete every order whose customer's status is 'closed'. Use a transaction.

Solution Using DELETE … USING + transaction

Detailed explanation. Wrap the destructive operation in a transaction so a preview is safe. DELETE … USING (Postgres) is the multi-table delete shape; SQL Server uses DELETE … FROM; MySQL uses DELETE alias FROM … JOIN ….

Code (PostgreSQL).

BEGIN;

-- preview first
SELECT o.order_id, o.customer_id, c.status
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.status = 'closed';

DELETE FROM orders o
USING customers c
WHERE o.customer_id = c.customer_id
  AND c.status = 'closed'
RETURNING o.order_id, o.customer_id;

COMMIT;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

step relation outcome
1 BEGIN open transaction
2 Preview SELECT confirm the rows that would be deleted
3 DELETE … USING joins to customers identify orders whose customer is closed
4 RETURNING o.order_id, o.customer_id capture deleted-row identifiers
5 COMMIT changes durable

Output: the RETURNING clause emits one row per deleted order.

Why this works — concept by concept:

  • BEGIN / COMMIT wraps the destructive op — a ROLLBACK mid-transaction undoes the delete without permanent damage.
  • Preview SELECT first — confirms the predicate matches the expected rows.
  • DELETE … USING — Postgres's multi-table delete syntax; same idea as MySQL's DELETE a FROM a JOIN b.
  • RETURNING captures deleted IDs — useful for downstream logging or compensating actions.
  • No CASCADE concerns — the delete is on orders only; customers rows are untouched.
  • Cost — single scan + hash probe; deletes Θ(matched_rows) rows.

SQL
Topic — data-manipulation
DELETE SQL drills

Practice →

SQL
Topic — database
Database SQL library

Practice →


5. UPSERT, MERGE, and ON CONFLICT — the idempotent write pattern

Diagram of UPSERT shapes — three side-by-side cards showing PostgreSQL INSERT … ON CONFLICT, MySQL INSERT … ON DUPLICATE KEY UPDATE, and SQL Server / Oracle / Snowflake MERGE INTO, on a light PipeCode card.

upsert sql and merge sql — insert if new, update if exists

upsert sql is the idempotent write pattern: insert the row if it doesn't exist, update it if it does. Every dialect spells this differently — INSERT … ON CONFLICT DO UPDATE (Postgres), INSERT … ON DUPLICATE KEY UPDATE (MySQL), MERGE INTO (SQL Server / Oracle / Snowflake / BigQuery) — but the semantics are identical. The merge into sql server / insert on conflict postgres family is the second-most-asked dml in sql interview topic after delete vs truncate.

PostgreSQL INSERT … ON CONFLICT DO UPDATE — the most readable.

INSERT INTO users (email, name, last_login)
VALUES ('alice@example.com', 'Alice', NOW())
ON CONFLICT (email) DO UPDATE
SET name       = EXCLUDED.name,
    last_login = EXCLUDED.last_login;
Enter fullscreen mode Exit fullscreen mode
  • ON CONFLICT (col) — names the unique key that triggers conflict resolution.
  • DO UPDATE SET … — the update clause; EXCLUDED.col references the values from the attempted insert.
  • DO NOTHING — skip the row instead of updating; useful for idempotent inserts.
  • insert on conflict postgres is the most-searched form of this pattern.

MySQL INSERT … ON DUPLICATE KEY UPDATE.

INSERT INTO users (email, name, last_login)
VALUES ('alice@example.com', 'Alice', NOW())
ON DUPLICATE KEY UPDATE
    name       = VALUES(name),
    last_login = VALUES(last_login);
Enter fullscreen mode Exit fullscreen mode
  • ON DUPLICATE KEY — fires when any unique key would be violated.
  • VALUES(col) — references the value from the attempted insert (deprecated in MySQL 8.0.20+; replaced by AS new SET col = new.col).
  • Modern MySQLINSERT INTO users (…) VALUES (…) AS new ON DUPLICATE KEY UPDATE col = new.col.

SQL Server / Oracle / Snowflake MERGE INTO.

MERGE INTO users AS target
USING (VALUES ('alice@example.com', 'Alice', CURRENT_TIMESTAMP)) AS src(email, name, last_login)
   ON target.email = src.email
WHEN MATCHED THEN
    UPDATE SET name = src.name, last_login = src.last_login
WHEN NOT MATCHED THEN
    INSERT (email, name, last_login) VALUES (src.email, src.name, src.last_login);
Enter fullscreen mode Exit fullscreen mode
  • MERGE INTO target USING source ON predicate — ANSI-standard merge.
  • WHEN MATCHED THEN UPDATE — runs when the predicate matches.
  • WHEN NOT MATCHED THEN INSERT — runs when the predicate doesn't match.
  • WHEN NOT MATCHED BY SOURCE THEN DELETE (SQL Server) — full three-way merge.
  • merge into sql server has been GA since SQL Server 2008; PostgreSQL added MERGE in 15.

Snowflake / BigQuery MERGE.

  • Snowflake — uses the same MERGE INTO syntax as SQL Server; very common in cloud warehouse interviews.
  • BigQueryMERGE with the same shape; used heavily in slowly-changing-dimension (SCD) Type 2 pipelines.

The use cases that demand UPSERT.

  • Idempotent ingest — re-running a pipeline must not double-insert.
  • Slowly-changing dimensions — customer dimension updates without losing history (combine MERGE with effective-date columns).
  • Event aggregation — running totals that update if the (entity, day) key exists, insert otherwise.
  • CDC / change-data-capture — apply inserts, updates, and deletes from a source change feed.

When to reach for which.

  • PostgreSQL primary stackINSERT … ON CONFLICT DO UPDATE; most readable, native, and well-supported.
  • MySQL primary stackINSERT … ON DUPLICATE KEY UPDATE; equally native.
  • SQL Server / Oracle / Snowflake / BigQueryMERGE INTO; the only option, also the most powerful (three-way matched / not-matched / not-matched-by-source).
  • PostgreSQL 15+ — also has MERGE; pick ON CONFLICT for upserts and MERGE for three-way logic.

SQL
Topic — data-manipulation
UPSERT / MERGE drills

Practice →

SQL
Topic — etl
ETL + UPSERT patterns

Practice →


6. Transactions, rollback, and write safety

BEGIN, COMMIT, ROLLBACK — the safety net every CRUD statement deserves

Every destructive sql update or sql delete belongs inside a transaction. The pattern is the same on every dialect: open the transaction, run the DML, inspect the result, then commit or roll back. The cost is one extra line of SQL; the upside is that you can preview and undo any mistake before it becomes permanent.

The canonical preview-and-commit pattern.

BEGIN;

-- preview the change
SELECT * FROM orders WHERE region = 'unknown';

-- do the change
UPDATE orders SET region = 'EU' WHERE region = 'unknown';

-- inspect again
SELECT * FROM orders WHERE region = 'unknown';   -- should be 0 rows
SELECT * FROM orders WHERE region = 'EU';         -- should include the updated rows

-- if happy:
COMMIT;
-- if not:
-- ROLLBACK;
Enter fullscreen mode Exit fullscreen mode

Transaction primitives by dialect.

  • PostgreSQLBEGIN; … COMMIT; or BEGIN; … ROLLBACK;; also SAVEPOINT and ROLLBACK TO SAVEPOINT.
  • MySQLSTART TRANSACTION; … COMMIT;; SET autocommit = 0 to make every statement transactional.
  • SQL ServerBEGIN TRANSACTION; … COMMIT TRANSACTION; or ROLLBACK TRANSACTION.
  • OracleSET TRANSACTION READ WRITE; … COMMIT;; implicit transaction starts on first DML.
  • SnowflakeBEGIN; … COMMIT;; full ACID semantics.

ACID properties — what transactions guarantee.

  • Atomicity — all DML in the transaction commits together, or none does.
  • Consistency — constraints (NOT NULL, foreign keys, unique) are checked at commit time.
  • Isolation — concurrent transactions don't see each other's uncommitted changes (depends on isolation level).
  • Durability — once committed, the change survives a crash (WAL / redo log).

Isolation levels — the cheat-bullet.

  • READ UNCOMMITTED — sees uncommitted writes from others ("dirty reads"); rarely used.
  • READ COMMITTED — sees only committed writes; the default in Postgres, SQL Server, Oracle.
  • REPEATABLE READ — repeated reads in the same transaction return the same result; MySQL default.
  • SERIALIZABLE — strongest; transactions appear to execute one at a time; sometimes via retry.

ROLLBACK — undo without commit.

  • Inside a transactionROLLBACK discards every change since BEGIN.
  • SAVEPOINT s1 + ROLLBACK TO s1 — partial rollback to a named point inside a transaction.
  • Implicit rollback — on session disconnect, network error, server crash; uncommitted changes are discarded.
  • The senior signal — knowing that ROLLBACK is your friend and using it freely.

The cost of long-running transactions.

  • Lock duration — a long UPDATE holds row-level locks until commit; concurrent writers wait.
  • WAL / undo log size — large transactions inflate the log; vacuum / log truncation depends on commit.
  • Snapshot bloat (Postgres) — MVCC snapshots accumulate; long transactions block VACUUM from reclaiming space.
  • Production discipline — break large batches into smaller transactions; commit every 1k–10k rows.

RETURNING / OUTPUT — pair with transactions.

  • UPDATE … RETURNING … (Postgres) — captures changed rows inside the transaction.
  • OUTPUT INSERTED.*, DELETED.* (SQL Server) — captures both old and new values; perfect for audit logs.
  • Use case — write an audit-trail INSERT in the same transaction that captures the UPDATE/DELETE side-effects.

SQL
Topic — database
Transactions SQL drills

Practice →

SQL
Topic — data-manipulation
Transactional CRUD patterns

Practice →


7. DML interview gotchas — missing WHERE, fan-out, lock escalation

The seven bugs interviewers test most often on CRUD

DML has a small surface area but a long tail of edge cases that fail candidates. These are the seven gotchas reviewers test most often when crud sql comes up in an interview.

Gotcha 1 — UPDATE / DELETE without WHERE.

  • The bugUPDATE orders SET status = 'paid' updates every row in orders.
  • Symptom — production-data corruption; "I just lost an afternoon" engineer slack messages.
  • Fix — always preview with SELECT * FROM t WHERE <predicate> first; wrap in a transaction.
  • The discipline — type WHERE before SET when writing UPDATE from scratch.

Gotcha 2 — fan-out from UPDATE … FROM JOIN.

  • The bug — joining to a source table with multiple matching rows updates the target row N times with the last match (engine-dependent).
  • Symptom — non-deterministic results; "the update sometimes does the right thing".
  • Fix — pre-aggregate the source to one row per join key inside a CTE.

Gotcha 3 — DELETE cascade chains too far.

  • The bugON DELETE CASCADE foreign keys propagate deletes through unrelated tables.
  • Symptom — deleting one customer wipes out their orders, their payments, their reviews — none of which the engineer intended.
  • Fix — confirm the foreign-key topology; use RESTRICT instead of CASCADE for safety; soft-delete via a deleted_at column instead.

Gotcha 4 — TRUNCATE cannot be rolled back in MySQL.

  • The behaviour — MySQL InnoDB TRUNCATE does an implicit commit; once issued, no ROLLBACK recovers the data.
  • Symptom — "I wrapped it in START TRANSACTION and it still committed".
  • Fix — use DELETE FROM t inside a transaction for previewable removal; or take a backup before TRUNCATE.

Gotcha 5 — INSERT … SELECT without dedup violates UNIQUE.

  • The bug — staging table has duplicates; insert into a table with a UNIQUE constraint errors on the second occurrence.
  • SymptomERROR: duplicate key value violates unique constraint.
  • Fix — dedup with DISTINCT ON / ROW_NUMBER() = 1 (Postgres) or INSERT … ON CONFLICT DO NOTHING for an idempotent ingest.

Gotcha 6 — implicit type coercion on the SET side of UPDATE.

  • The bugSET int_col = '42' silently parses the string; if the string is 'abc' the update errors mid-batch and leaves the table partially updated (if no transaction wraps it).
  • Fix — explicit CAST (see Blog78); validate upstream; wrap in a transaction so partial application can roll back.

Gotcha 7 — lock escalation on large UPDATE / DELETE.

  • The behaviour — many row-level locks get promoted to table-level locks above an engine-specific threshold; concurrent writers block until the operation completes.
  • Symptom — production app freezes for the duration of a multi-million-row UPDATE.
  • Fix — batch the change in chunks (WHERE id BETWEEN 1 AND 10000, LIMIT 10000, etc.) and commit between chunks.

SQL
Topic — data-manipulation
DML gotcha drills

Practice →

SQL
Topic — database
Database safety drills

Practice →


Choosing the right DML operator (cheat sheet)

A one-screen cheat sheet for using SQL INSERT, UPDATE, DELETE — pick the verb that matches your intent, then the dialect-specific syntax.

You want to … Operator Notes
Add new rows from literals INSERT INTO t (cols) VALUES (…) Always include the column list
Add many rows in one statement INSERT INTO t (cols) VALUES (…), (…), (…) Atomic; 10-100× faster than N inserts
Add rows from another table INSERT INTO t (cols) SELECT … FROM s The ETL workhorse
Bulk-load from a file COPY (PG) / LOAD DATA INFILE (MySQL) / BULK INSERT (SQL Server) For > 100k rows
Change existing rows UPDATE t SET … WHERE … Always WHERE
Update from another table UPDATE t SET col = s.val FROM s WHERE … (PG) UPDATE t JOIN s (MySQL); UPDATE t … FROM (SQL Server)
Remove specific rows DELETE FROM t WHERE … Per-row WAL; supports rollback
Empty a whole table fast TRUNCATE TABLE t Minimally logged; resets identity
Remove the table itself DROP TABLE t Irreversible without backup
Insert if new, update if exists INSERT … ON CONFLICT DO UPDATE (PG) ON DUPLICATE KEY UPDATE (MySQL); MERGE INTO (SQL Server, Oracle, Snowflake)
Idempotent insert (skip duplicates) INSERT … ON CONFLICT DO NOTHING Safe for re-runnable pipelines
Three-way merge (insert/update/delete) MERGE INTO t USING s ON … SCD Type 2 pattern
Preview before commit BEGIN; SELECT …; DML; SELECT …; ROLLBACK; The safety pattern
Capture affected rows … RETURNING * (PG/Oracle) / OUTPUT INSERTED.*, DELETED.* (SQL Server) One round-trip audit

Frequently asked questions

What's the difference between DELETE, TRUNCATE, and DROP in SQL?

DELETE removes rows that match a WHERE predicate — it's row-by-row, fully logged, triggers fire, foreign keys are checked, and the change can be rolled back inside a transaction. TRUNCATE TABLE removes every row in a single fast operation — minimally logged, no WHERE allowed, no triggers fire on most engines, the identity sequence usually resets, and it's 10-1000× faster than DELETE on large tables (but cannot be rolled back in MySQL InnoDB). DROP TABLE removes the table itself — rows AND schema, indexes, constraints, triggers — and is irreversible without a backup. The canonical interview rule: DELETE for row-level removal with rollback safety; TRUNCATE for fast whole-table reset; DROP for deleting the table definition entirely.

How do I write an UPSERT in SQL?

Every dialect spells UPSERT differently. PostgreSQLINSERT INTO t (cols) VALUES (…) ON CONFLICT (unique_col) DO UPDATE SET col = EXCLUDED.col. MySQLINSERT INTO t (cols) VALUES (…) ON DUPLICATE KEY UPDATE col = VALUES(col) (or the modern … AS new ON DUPLICATE KEY UPDATE col = new.col since 8.0.20). SQL Server / Oracle / Snowflake / BigQuery / PostgreSQL 15+MERGE INTO t USING source ON match_predicate WHEN MATCHED THEN UPDATE SET … WHEN NOT MATCHED THEN INSERT (…) VALUES (…). The semantics are identical across all of them: insert if no row matches the unique key, update if a row already exists. UPSERT is the foundation of idempotent pipelines — re-running an ingest must not double-insert, and INSERT … ON CONFLICT / MERGE makes that a one-liner.

What happens if I run UPDATE or DELETE without a WHERE clause?

It updates or deletes every row in the table — silently, with no warning. This is the canonical sql interview questions trap and the source of more "I just nuked production" stories than any other DML mistake. The defence: never write UPDATE or DELETE from a blank starting point; type the WHERE clause first, then the SET / FROM. Wrap risky DML in a transaction (BEGIN; UPDATE …; SELECT *; ROLLBACK). Preview with SELECT * FROM t WHERE <predicate> to confirm the row set before changing the verb. Some engines (MySQL 8 with safe-updates mode) refuse to execute UPDATE / DELETE without a WHERE clause that includes a key column — enable this setting in development environments.

How do I update one table based on values in another table?

Use cross-table UPDATE — the dialect-specific syntax differs but the semantics are identical. PostgreSQLUPDATE target t SET col = s.val FROM source s WHERE t.id = s.id. MySQLUPDATE target t JOIN source s ON t.id = s.id SET t.col = s.val. SQL ServerUPDATE t SET col = s.val FROM target t JOIN source s ON t.id = s.id. Oracle / Snowflake — use MERGE INTO target USING source ON (…) WHEN MATCHED THEN UPDATE SET …. Always include a WHERE clause (or a MERGE join predicate) that scopes the update; pre-aggregate the source to one row per join key inside a CTE if the source can have duplicates, otherwise the update fans out non-deterministically.

Should I use INSERT INTO SELECT or COPY / LOAD DATA INFILE for bulk loading?

For under ~100k rows, INSERT INTO t (cols) SELECT … FROM staging (or multi-row INSERT … VALUES) is fine — the planner builds one execution plan and writes are batched. For larger loads, switch to the dialect's bulk loader: PostgreSQL COPY FROM (the fastest by far for file ingest, bypasses much of the WAL overhead), MySQL LOAD DATA INFILE, SQL Server BULK INSERT or bcp, Snowflake COPY INTO (from S3 / GCS / Azure), BigQuery's LOAD DATA or table-loader API. The bulk loaders skip per-row WAL formatting and trigger firing where possible — they trade some safety for raw throughput. Inside an ETL pipeline, the standard pattern is: bulk-load into a staging table, then run INSERT INTO live SELECT … FROM staging with explicit deduplication and constraint checks.


Practice on PipeCode

PipeCode ships 450+ data-engineering interview problems — including PostgreSQL-first SQL practice keyed to DML CRUD (INSERT, UPDATE, DELETE), UPSERT patterns with ON CONFLICT / MERGE, transactions, ETL ingest, and the dialect quirks that fail candidates who memorise only one engine. Whether you're drilling sql interview questions with answers for sql for data engineers loops or grinding through data engineering interview questions end-to-end, the practice library mirrors the same safety-first mental model this guide teaches.

Kick off via Explore practice →; drill the data-manipulation SQL lane →; fan out into the ETL SQL lane →; rehearse database problems →; reinforce JOIN + UPDATE patterns →; widen coverage on the full SQL practice library →.

Top comments (0)