sql create table defines the shape of every relation in a database — columns and their types, primary keys and unique constraints, foreign-key references, NOT NULL guarantees, DEFAULT values, and CHECK predicates that the engine enforces on every write. sql alter table is how that shape changes after launch — adding columns, dropping columns, modifying types, renaming tables, attaching new constraints. Together they form sql ddl (Data Definition Language), the sql schema design vocabulary every sql interview questions panel tests when the conversation moves past SELECT and into "how would you model this."
This guide walks through every clause in the create table sql family that reviewers love to test in data engineering interview questions: the full CREATE TABLE signature with column types and constraints, the PRIMARY KEY / FOREIGN KEY / NOT NULL / UNIQUE / CHECK constraint catalogue, DEFAULT values and computed columns, the alter table sql vocabulary (ADD COLUMN, DROP COLUMN, ALTER COLUMN, RENAME, ADD CONSTRAINT), create index sql patterns paired with table design, online-migration safety on production tables, and the seven gotchas (locking, type-change rewrites, dropped indexes, FK cascade chains) that fail most candidates. Every section ends as sql interview questions with answers: a runnable PostgreSQL DDL plus DML query, a traced execution, an output table, and a concept-by-concept why this works breakdown — the exact shape sql for data engineers rounds reward when ddl in sql comes up.
When you want hands-on reps immediately after reading, browse SQL practice library →, drill the database problems →, sharpen indexing SQL drills →, rehearse data-manipulation patterns →, or widen coverage on the full SQL practice library →.
On this page
- Why CREATE TABLE and ALTER TABLE matter in data engineering interviews
- Anatomy of CREATE TABLE — columns, types, and the constraint catalogue
- PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, NOT NULL
- DEFAULT values, computed columns, and identity / SERIAL
- ALTER TABLE — ADD, DROP, MODIFY, RENAME column patterns
- Indexes paired with table design — CREATE INDEX
- Online migration gotchas — locks, type-change rewrites, FK chains
- Choosing the right DDL pattern (cheat sheet)
- Frequently asked questions
- Practice on PipeCode
1. Why CREATE TABLE and ALTER TABLE matter in data engineering interviews
Schema design is what separates juniors from seniors — every constraint is a future bug prevented
The one-sentence invariant: sql create table defines the relation's shape and the constraints the engine enforces on every write; sql alter table evolves that shape over time without dropping the data. Once you internalise that — and learn which constraints belong in the schema vs in the application layer — every prompt in the sql schema design family becomes a question of "what does this data actually require to be valid?"
The CREATE TABLE mental model in three bullets.
-
Columns + types define the row shape —
order_id BIGINT,region VARCHAR(10),amount NUMERIC(12,2),created_at TIMESTAMPTZ. -
Constraints define what makes a row valid —
PRIMARY KEY,FOREIGN KEY,NOT NULL,UNIQUE,CHECK,DEFAULT. -
Indexes are not "constraints" but live alongside the table —
PRIMARY KEYandUNIQUEimplicitly create them;CREATE INDEXadds them explicitly.
The five constraint types in the catalogue.
-
PRIMARY KEY— uniquely identifies a row; implicitlyNOT NULL+UNIQUE; one per table; the most-askedcreate table sqlconstraint in interviews. -
FOREIGN KEY— references another table's primary or unique key; enforces referential integrity (ON DELETE CASCADE/RESTRICT/SET NULLcascade rules). -
NOT NULL— guarantees the column is neverNULL; the cheapest, most-impactful constraint to add. -
UNIQUE— guarantees no duplicate values across rows; one or manyUNIQUEcolumns per table. -
CHECK (predicate)— engine evaluates the predicate on every write; row must satisfy it.
Why interviewers love DDL questions.
- It surfaces data modelling instincts — the right primary key, the right level of normalisation, the right column types.
-
It tests dialect knowledge —
SERIAL(Postgres) vsAUTO_INCREMENT(MySQL) vsIDENTITY(SQL Server);JSONB(Postgres) vsJSON(MySQL) vsVARIANT(Snowflake). -
It probes migration safety — does the candidate know that
ALTER TABLE ALTER COLUMN TYPEcan rewrite a multi-gigabyte table? -
It hits operational realism —
CREATE INDEX CONCURRENTLY,ALTER TABLE … VALIDATE CONSTRAINT,online DDLon MySQL.
What interviewers listen for.
- Do you name the primary key explicitly and explain why? — "I'm using
(order_id)because it's the natural unique identifier." - Do you think about NULLability before declaring every column? — "
regionisNOT NULLbecause the application never writes an unknown region." - Do you mention FK ON DELETE behaviour when designing relationships? — senior signal.
- Do you reach for
CREATE INDEX CONCURRENTLYwhen adding an index on a busy production table? — bonus points (Postgres).
Worked example — design and populate an orders table
Detailed explanation. Realistic table design starts from the questions the table will answer ("what's the revenue per region last week?") and the cardinality of the data ("a few million rows a year"). The constraints encode what every row must satisfy; the column types match the natural domain.
Question. Design an orders table holding order_id, customer_id (foreign key), region, amount, status, created_at. Include a primary key, a foreign key to a customers(customer_id) table, NOT NULL on every column, a CHECK that amount > 0, a DEFAULT for status, and an index on (created_at) for time-range queries.
Code (PostgreSQL).
CREATE TABLE orders (
order_id BIGSERIAL PRIMARY KEY,
customer_id BIGINT NOT NULL
REFERENCES customers (customer_id) ON DELETE RESTRICT,
region VARCHAR(10) NOT NULL,
amount NUMERIC(12,2) NOT NULL CHECK (amount > 0),
status VARCHAR(20) NOT NULL DEFAULT 'pending',
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_orders_created_at ON orders (created_at);
CREATE INDEX idx_orders_customer ON orders (customer_id);
Step-by-step explanation.
-
order_id BIGSERIAL PRIMARY KEY— auto-incrementing 8-byte integer primary key. -
customer_idwithREFERENCES customers (customer_id) ON DELETE RESTRICT— FK that blocks deleting a customer who has orders. -
region,amount,status,created_at— allNOT NULLto make missing values impossible. -
CHECK (amount > 0)— engine refuses any insert / update that violates the predicate. -
DEFAULT 'pending'andDEFAULT NOW()— sensible auto-values when omitted. - Two explicit
CREATE INDEXstatements for the common access patterns (time range and customer-scoped queries).
Output (after one insert).
| order_id | customer_id | region | amount | status | created_at |
|---|---|---|---|---|---|
| 1 | 42 | US | 100.00 | pending | 2026-05-23 14:30:00+00 |
Rule of thumb: design the table from the queries it will serve, not from the source data's shape; constraints are cheaper to add at CREATE TABLE time than after launch.
SQL
Topic — database
Schema design SQL drills
SQL
Topic — sql
SQL practice library
2. Anatomy of CREATE TABLE — columns, types, and the constraint catalogue
create table sql — the full signature, slice by slice
Before any specific constraint, the mental model: a CREATE TABLE statement is a slice-by-slice declaration of column-by-column shape, terminated by optional table-level constraints. Read it top-to-bottom and you've described every guarantee the engine will enforce.
The full CREATE TABLE signature.
CREATE TABLE <table_name> (
<col_name> <data_type> [<column_constraints>],
<col_name> <data_type> [<column_constraints>],
…,
[<table_constraints>]
);
-
Table name — schema-qualified (
schema.table) or unqualified (uses the default schema). - Column definitions — type plus zero or more column-level constraints.
- Table-level constraints — composite PK, FK, UNIQUE, CHECK spanning multiple columns.
Common data types — the cross-dialect cheat.
-
Integer —
INTEGER(32-bit),BIGINT(64-bit),SMALLINT(16-bit); the universal numeric tier. -
Decimal —
NUMERIC(p, s)/DECIMAL(p, s)— fixed precision for currency;ptotal digits,sscale. -
Float —
REAL(single),DOUBLE PRECISION(double); for scientific computation, not money. -
String —
VARCHAR(n)(max length n),TEXT(unbounded in Postgres / MySQL). -
Date / time —
DATE,TIMESTAMP,TIMESTAMPTZ(Postgres) /DATETIME(MySQL / SQL Server). See Blog74. -
Boolean —
BOOLEAN(Postgres / MySQL 8),BIT(SQL Server). -
JSON —
JSONB(Postgres, indexed),JSON(MySQL),VARIANT(Snowflake). -
UUID —
UUID(Postgres native),UNIQUEIDENTIFIER(SQL Server),CHAR(36)portable.
Column-level vs table-level constraints.
-
Column-level — apply to one column, declared inline:
email VARCHAR(255) UNIQUE NOT NULL. -
Table-level — apply to one or many columns, declared after the column list:
UNIQUE (col_a, col_b),PRIMARY KEY (col_a, col_b). -
Composite keys require table-level form —
PRIMARY KEY (region, customer_id)is a two-column PK. -
Named constraints —
CONSTRAINT chk_amount_positive CHECK (amount > 0)— gives the constraint a name you can reference inALTER TABLE … DROP CONSTRAINT.
Optional clauses around CREATE TABLE.
-
CREATE TABLE IF NOT EXISTS— idempotent; useful in migration scripts. -
CREATE TEMPORARY TABLE— session-scoped table dropped on session end. -
CREATE TABLE … AS SELECT(CTAS) — create and populate in one statement; column types inferred from the source query. -
CREATE TABLE … (LIKE other_table)— copy schema from another table (Postgres / MySQL). -
CREATE TABLE … PARTITION BY RANGE / LIST / HASH— partitioned tables for very large fact tables.
Dialect divergence — the things every interview tests.
-
Auto-increment —
SERIAL/BIGSERIAL/IDENTITY(Postgres 10+);AUTO_INCREMENT(MySQL);IDENTITY(1,1)(SQL Server);SEQUENCE(Oracle). -
String types —
TEXTis preferred in Postgres (no penalty vsVARCHAR); MySQL hasVARCHAR(n)size matter for index byte-budget. -
Boolean — Postgres / MySQL 8 native
BOOLEAN; SQL Server usesBIT (0/1). -
JSON — Postgres
JSONBis fully indexed; MySQLJSONis binary but indexable via generated columns. -
Timestamps with TZ — Postgres
TIMESTAMPTZ; SnowflakeTIMESTAMP_TZ; SQL ServerDATETIMEOFFSET. See Blog74 §2.
SQL
Topic — database
CREATE TABLE drills
SQL
Topic — type-handling
Data-type SQL practice
3. PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, NOT NULL
sql constraints — five rules the engine enforces on every write
The five constraint types are the heart of sql schema design. Each one is a guarantee the database makes about every row that lands in the table — and every guarantee is a class of bug the application layer no longer needs to defend against.
PRIMARY KEY — the canonical identifier.
-
Signature —
<col> <type> PRIMARY KEY(column-level) orPRIMARY KEY (col_a, col_b)(table-level for composite). -
Implicit guarantees —
NOT NULLandUNIQUE. - One per table — there can only be one PRIMARY KEY constraint.
- Implicit index — every PRIMARY KEY automatically creates an index on its columns.
-
Choice of key type — surrogate (auto-increment
BIGSERIAL/IDENTITY) vs natural (an existing unique business value likeemail); surrogate is the safer default.
FOREIGN KEY — referential integrity.
-
Column-level —
customer_id BIGINT REFERENCES customers (customer_id). -
Table-level —
FOREIGN KEY (customer_id) REFERENCES customers (customer_id). -
ON DELETEactions —CASCADE(delete dependent rows),RESTRICT/NO ACTION(block the parent delete),SET NULL(null out the FK),SET DEFAULT(revert to the column's default). -
ON UPDATEactions — same options for parent-key updates. - Performance cost — every insert into the child table checks the parent for the referenced row; ensure the referenced column is indexed (it is by default if it's the parent's PK).
-
sql foreign keyis one of the most-askedcreate table sqlinterview prompts.
NOT NULL — the cheapest, most-impactful constraint.
-
Inline —
region VARCHAR(10) NOT NULL. -
Default behaviour — columns are nullable unless
NOT NULLis declared. -
Catches missing values upstream — better than discovering
NULLs in downstream reports. -
Almost-always-on rule — columns should be
NOT NULLunless missing data is semantically meaningful.
UNIQUE — no duplicates allowed.
-
Column-level —
email VARCHAR(255) UNIQUE. -
Table-level —
UNIQUE (col_a, col_b)for composite uniqueness. -
Implicit index — every
UNIQUEconstraint creates an index. -
NULLexception — most engines allow multipleNULLs in aUNIQUEcolumn (becauseNULL = NULLisUNKNOWN); SQL Server treatsNULLas a single value forUNIQUEpurposes. -
The natural-key pattern — when a business value (like
email) is already unique, add aUNIQUEconstraint on top of the surrogate PRIMARY KEY.
CHECK (predicate) — domain-level validation.
-
Column-level —
amount NUMERIC(12,2) CHECK (amount > 0). -
Table-level —
CHECK (start_date < end_date)referencing multiple columns. -
Engine evaluates on every insert / update — refuses the row if the predicate is
FALSE(returnsUNKNOWNforNULL). - Use cases — non-negative amounts, valid status enum values, sane date ranges, length bounds.
-
MySQL caveat — supports
CHECKsyntax since 5.7 but only enforces it since 8.0.16.
DEFAULT <expr> — implicit value on omitted insert.
-
Inline —
status VARCHAR(20) NOT NULL DEFAULT 'pending'. -
Common patterns —
DEFAULT NOW()(Postgres) /DEFAULT CURRENT_TIMESTAMP(MySQL / SQL Server) forcreated_atcolumns. -
Generated default —
DEFAULT uuid_generate_v4()for UUID PKs. -
Pairs cleanly with
NOT NULL— the engine fills in the value when the insert omits the column.
sql primary key vs foreign key — the interview comparison.
- PRIMARY KEY — uniquely identifies rows in this table.
- FOREIGN KEY — references the primary (or unique) key of another table.
- PK is local, FK is referential — that single sentence is the textbook interview answer.
SQL interview question — orders + customers schema with FK and cascading rules
Assume you need customers(customer_id, email, name, status) and orders(order_id, customer_id, amount, status, created_at). Design both tables with primary keys, a foreign key from orders.customer_id to customers.customer_id, an FK action that blocks deleting a customer who has orders, and constraints that prevent invalid data.
Solution Using CREATE TABLE + named constraints
Code (PostgreSQL).
CREATE TABLE customers (
customer_id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE,
name VARCHAR(100) NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'active'
CHECK (status IN ('active', 'suspended', 'closed'))
);
CREATE TABLE orders (
order_id BIGSERIAL PRIMARY KEY,
customer_id BIGINT NOT NULL,
amount NUMERIC(12,2) NOT NULL CHECK (amount > 0),
status VARCHAR(20) NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending', 'paid', 'refund', 'cancel')),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id) REFERENCES customers (customer_id)
ON DELETE RESTRICT
);
CREATE INDEX idx_orders_customer ON orders (customer_id);
CREATE INDEX idx_orders_created ON orders (created_at);
Step-by-step trace.
| step | object | outcome |
|---|---|---|
| 1 |
customers PK + UNIQUE on email |
natural-key uniqueness plus surrogate PK |
| 2 | customers.status CHECK |
only the three enum values allowed |
| 3 |
orders PK + named FK + CHECKs |
full referential integrity plus domain checks |
| 4 | ON DELETE RESTRICT |
prevents deleting a customer who has any order |
| 5 | Two indexes on orders
|
match the common query access patterns |
Output: the schema enforces every invariant at write time; the application never has to defend against unknown statuses or orphaned orders.
Why this works — concept by concept:
- BIGSERIAL surrogate PK — auto-incrementing 8-byte integer; the safe default unless the data has a natural key.
- UNIQUE on customers.email — enforces business-level uniqueness while the surrogate handles internal joins.
- CHECK status IN (…) — engine refuses any insert / update with an unknown status (cheaper than application-layer validation).
- FK ON DELETE RESTRICT — blocks the parent delete; the senior alternative to silent cascade chains.
-
Named FK constraint —
CONSTRAINT fk_orders_customerlets youALTER TABLE … DROP CONSTRAINTlater if needed. - Two explicit indexes — match real query access patterns (customer-scoped and time-range queries).
-
Cost — index maintenance is
Θ(log n)per insert; the constraint enforcement is constant per row.
SQL
Topic — database
Constraints SQL drills
SQL
Topic — sql
Foreign-key SQL patterns
4. DEFAULT values, computed columns, and identity / SERIAL
default value sql and auto-generated columns — let the engine fill in what you don't write
DEFAULT is the unsung hero of schema design. Every column that has a "sensible value when missing" should have a DEFAULT clause — created_at defaults to NOW(), status defaults to 'pending', surrogate IDs default to the next sequence value. The pattern eliminates entire classes of "I forgot to set this column" bugs.
DEFAULT <literal> — the simplest form.
-
String —
status VARCHAR(20) NOT NULL DEFAULT 'pending'. -
Number —
retry_count INTEGER NOT NULL DEFAULT 0. -
Boolean —
is_active BOOLEAN NOT NULL DEFAULT TRUE. -
NULL —
optional_col VARCHAR(50) DEFAULT NULL(rarely needed; the absence ofNOT NULLalready permits NULL).
DEFAULT <expression> — dynamic defaults.
-
Current timestamp —
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()(Postgres) /DEFAULT CURRENT_TIMESTAMP(MySQL / SQL Server). -
UUID —
id UUID PRIMARY KEY DEFAULT gen_random_uuid()(Postgres 13+) /DEFAULT (UUID())(MySQL 8). -
Sequence —
order_id BIGINT NOT NULL DEFAULT nextval('orders_seq')(Postgres explicit sequence).
Auto-increment / IDENTITY / SERIAL — surrogate primary keys.
-
PostgreSQL —
BIGSERIAL(legacy) andBIGINT GENERATED ALWAYS AS IDENTITY(SQL standard, since PG 10). -
MySQL —
BIGINT AUTO_INCREMENT PRIMARY KEY. -
SQL Server —
BIGINT IDENTITY(1,1) PRIMARY KEY. -
Oracle —
NUMBER GENERATED BY DEFAULT AS IDENTITYor aSEQUENCE+ trigger. -
Snowflake —
NUMBER AUTOINCREMENT START 1 INCREMENT 1.
The choice of integer type for the PK.
-
INTEGER(32-bit, max 2.1B) — safe for moderate-volume tables; will overflow on very-large-scale OLTP. -
BIGINT(64-bit, max 9.2 × 10¹⁸) — the safe default for new tables; effectively never overflows. -
UUID— globally unique without coordination; larger (16 bytes vs 8); harder to index; useful for distributed systems. -
The senior rule of thumb —
BIGINTsurrogate PK + business-keyUNIQUEconstraint.
Computed / generated columns — values derived from other columns.
-
PostgreSQL —
total NUMERIC GENERATED ALWAYS AS (quantity * unit_price) STORED. -
MySQL —
total NUMERIC AS (quantity * unit_price) STORED(orVIRTUALfor non-materialised). -
SQL Server —
total AS (quantity * unit_price) PERSISTED. -
Use cases — denormalised aggregates, JSON path extraction (
(payload->>'amount')::numeric), uppercase-shadow columns for case-insensitive search.
identity column sql — the modern SQL-standard form.
-
<col> <type> GENERATED ALWAYS AS IDENTITY— the column is always system-generated; manual inserts to it error. -
<col> <type> GENERATED BY DEFAULT AS IDENTITY— auto-generated unless an explicit value is provided. - Supported in — Postgres 10+, Oracle 12c+, DB2; gradually replacing dialect-specific syntax.
Common DEFAULT pitfalls.
-
DEFAULTwith non-deterministic functions —NOW()is fine (evaluated at insert time), butRANDOM()may evaluate once at definition (dialect-specific). -
DEFAULTdoes not back-fill existing rows — addingDEFAULTto a column doesn't change existing rows; you needUPDATE … SET col = DEFAULT WHERE col IS NULL. - Sequence "holes" — auto-increment values are consumed on every insert attempt (even failed ones); holes are normal and benign.
-
AUTO_INCREMENTresets onTRUNCATE— confirm this is desired before truncating tables with FK references.
SQL
Topic — database
DEFAULT / IDENTITY drills
SQL
Topic — sql
Generated columns SQL library
5. ALTER TABLE — ADD, DROP, MODIFY, RENAME column patterns
alter table sql — evolve the schema without dropping the data
sql alter table is the vocabulary for evolving a table after launch. Every action (ADD COLUMN, DROP COLUMN, ALTER COLUMN TYPE, RENAME) has its own cost profile, and knowing which ones rewrite the whole table vs which ones just patch metadata is what separates the senior data engineer from the junior who locks a 100GB table for an hour.
ALTER TABLE … ADD COLUMN — the most common evolution.
-
Signature —
ALTER TABLE t ADD COLUMN <col_name> <type> [<constraints>]. -
Postgres 11+ —
ADD COLUMNwith a constantDEFAULTis metadata-only (no rewrite); a non-constant default rewrites. -
MySQL 8 online DDL —
ALTER TABLE … ADD COLUMN, ALGORITHM=INPLACE, LOCK=NONEfor non-blocking adds. -
SQL Server — adding a
NULLcolumn is metadata-only; addingNOT NULLwithDEFAULTrewrites. -
Safe pattern — add the column as nullable first, backfill, then add
NOT NULL.
ALTER TABLE … DROP COLUMN — usually fast.
-
Signature —
ALTER TABLE t DROP COLUMN <col_name>. -
Postgres — metadata-only; the column data isn't reclaimed until
VACUUM FULLorpg_repack. - MySQL — usually requires a table rewrite (algorithm-dependent).
- SQL Server — metadata-only by default.
-
The risk —
DROP COLUMNis irreversible without a backup; never drop columns the application still references.
ALTER TABLE … ALTER COLUMN TYPE — the expensive one.
-
Signature —
ALTER TABLE t ALTER COLUMN c TYPE <new_type>(Postgres) /MODIFY COLUMN c <new_type>(MySQL) /ALTER COLUMN c <new_type>(SQL Server). - Cost — typically requires a full table rewrite (every row's bytes change); locks the table for the duration on most engines.
-
USING <expression>— Postgres lets you specify how to convert the old values:ALTER COLUMN amount TYPE NUMERIC USING amount::numeric. - Safe pattern — add a new column, backfill with the converted values, swap the application reads, drop the old column.
ALTER TABLE … RENAME COLUMN and RENAME TO.
-
RENAME COLUMN old_name TO new_name— metadata-only; near-instant on every dialect. -
RENAME TO new_table_name— also metadata-only; near-instant. - The catch — every view / stored procedure / application query referencing the old name breaks immediately; coordinate the rename with code deploys.
ALTER TABLE … ADD CONSTRAINT / DROP CONSTRAINT.
-
Add FK —
ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers (customer_id). -
Add CHECK —
ALTER TABLE orders ADD CONSTRAINT chk_amount CHECK (amount > 0). -
Add UNIQUE —
ALTER TABLE users ADD CONSTRAINT uq_users_email UNIQUE (email). -
Drop by name —
ALTER TABLE orders DROP CONSTRAINT fk_orders_customer— always name your constraints up front. -
Postgres
NOT VALID—ALTER TABLE … ADD CONSTRAINT … CHECK (…) NOT VALIDadds the constraint without validating existing rows; youVALIDATE CONSTRAINTlater when convenient (skips the table-scan lock).
ALTER TABLE … ALTER COLUMN SET NOT NULL.
- Postgres — table scan to verify no existing NULL; can be slow on large tables.
-
Workaround —
ADD CHECK (col IS NOT NULL) NOT VALID, thenVALIDATE CONSTRAINT, then convert toNOT NULL(avoids the long lock). - SQL Server / MySQL — same effective scan; same caveats.
ALTER TABLE actions by dialect — the cheat-bullet.
-
PostgreSQL —
ADD COLUMN,DROP COLUMN,ALTER COLUMN,RENAME COLUMN,RENAME CONSTRAINT,ADD CONSTRAINT,DROP CONSTRAINT,SET / DROP DEFAULT,SET / DROP NOT NULL. -
MySQL —
ADD COLUMN,DROP COLUMN,MODIFY COLUMN,CHANGE COLUMN(rename + retype),RENAME COLUMN,RENAME TO,ADD INDEX,DROP INDEX. -
SQL Server —
ADD <col>,DROP COLUMN,ALTER COLUMN,sp_rename(for column / table rename),ADD CONSTRAINT,DROP CONSTRAINT. -
Oracle — same actions;
MODIFYkeyword for type changes. -
Snowflake — supports most actions; some types (
VARIANT,OBJECT) have specific evolution paths.
SQL
Topic — database
ALTER TABLE SQL drills
SQL
Topic — sql
Schema migration patterns
6. Indexes paired with table design — CREATE INDEX
create index sql — every access pattern deserves an index
Indexes are not "constraints" but they live inside the same schema-design conversation. Every query that hits a column in WHERE, JOIN, or ORDER BY either uses an index or scans the table; the difference between sub-millisecond and multi-second response time is whether the right indexes exist.
The CREATE INDEX signature.
-
B-tree (default) —
CREATE INDEX idx_orders_created ON orders (created_at). -
Composite —
CREATE INDEX idx_orders_customer_created ON orders (customer_id, created_at). -
Unique —
CREATE UNIQUE INDEX uq_users_email ON users (email)— implicitly enforces uniqueness, like aUNIQUEconstraint. -
Partial / Filtered —
CREATE INDEX idx_active_orders ON orders (created_at) WHERE status = 'active'(Postgres) /WHEREclause syntax (SQL Server filtered indexes). -
Functional / Expression —
CREATE INDEX idx_users_lower_email ON users (LOWER(email))for case-insensitive lookup.
Index types beyond B-tree.
-
PostgreSQL —
BTREE,HASH,GIN(for JSONB / array),GIST(geometry / range types),BRIN(block-range for huge sorted tables),SP-GIST. -
MySQL —
BTREE,HASH(memory engine only),FULLTEXT(text search),SPATIAL(geometry). -
SQL Server —
CLUSTERED(defines the row's physical order; one per table),NONCLUSTERED(default),COLUMNSTORE(analytics). - The interview-canonical answer — B-tree is the default and right choice 95% of the time; reach for others only when the query pattern demands it.
CREATE INDEX CONCURRENTLY (Postgres) — non-blocking index build.
- Builds the index without locking the table for writes — at the cost of a longer total build time (two passes).
- Required for production index adds on busy tables.
-
MySQL equivalent —
ALTER TABLE … ADD INDEX, ALGORITHM=INPLACE, LOCK=NONE. -
SQL Server equivalent —
CREATE INDEX … WITH (ONLINE = ON).
Index selection rules of thumb.
- Every FK column should be indexed — speeds up parent-delete checks and child-scan joins.
-
Every column in
WHERE,JOIN,ORDER BYis a candidate; profile before adding more. -
Composite indexes match leftmost-prefix queries —
INDEX (a, b)acceleratesWHERE a = ?,WHERE a = ? AND b = ?, but NOTWHERE b = ?alone. -
Don't over-index OLTP tables — every index slows down
INSERT/UPDATE/DELETEproportionally. -
Cover columns in the
INCLUDEclause (Postgres 11+ / SQL Server) — avoid the heap-fetch step for index-only scans.
Common index pitfalls.
-
Function-wrapped columns defeat the index —
WHERE LOWER(email) = '…'won't useINDEX (email)unless the index is onLOWER(email). -
Implicit type coercion defeats the index —
WHERE int_col = '42'may not use the int index. See Blog78. -
Indexing low-cardinality columns is rarely useful —
WHERE status = 'paid'over a 50/50 distribution; the planner often picks a sequential scan instead. -
Stale stats — after a big load, run
ANALYZE(Postgres) orUPDATE STATISTICS(SQL Server) so the planner picks the right plan.
SQL
Topic — indexing
Indexing SQL drills
SQL
Topic — database
Index + schema patterns
7. Online migration gotchas — locks, type-change rewrites, FK chains
The seven bugs interviewers test most often on DDL changes
DDL has a small surface area but a long tail of edge cases that fail candidates — especially when the conversation moves from "design a new table" to "evolve a 100GB production table without downtime."
Gotcha 1 — ALTER TABLE ALTER COLUMN TYPE rewrites the whole table.
-
The bug —
ALTER COLUMN amount TYPE NUMERIC(14, 2)may hold an exclusive lock and rewrite every row for the duration. - Symptom — production app blocked for minutes to hours.
- Fix — add a new column, backfill in batches, swap reads, drop the old column.
Gotcha 2 — adding NOT NULL requires a full table scan.
-
The bug —
ALTER COLUMN … SET NOT NULLscans every row to confirm none areNULL; holds a lock for the duration. -
Fix (Postgres) —
ADD CHECK (col IS NOT NULL) NOT VALID→VALIDATE CONSTRAINT(background) → drop the check +SET NOT NULLafter the column is known clean.
Gotcha 3 — CREATE INDEX (without CONCURRENTLY) locks writes.
-
The bug —
CREATE INDEXtakes an exclusive write lock on the table for the duration. -
Fix —
CREATE INDEX CONCURRENTLY(Postgres) orALGORITHM=INPLACE, LOCK=NONE(MySQL) orONLINE = ON(SQL Server).
Gotcha 4 — ON DELETE CASCADE propagates further than expected.
- The bug — deleting one customer cascades through orders, payments, reviews, audit-log entries.
- Symptom — single-row delete takes seconds and wipes out unrelated data.
-
Fix — use
ON DELETE RESTRICTby default; soft-delete via adeleted_atcolumn; explicit cascade only when business semantics require it.
Gotcha 5 — RENAME COLUMN breaks every dependent view / proc.
- The bug — the rename is metadata-only and fast, but every view, stored procedure, application query referencing the old name fails immediately.
- Fix — coordinate the rename with the code deploy; or add the new column, dual-write, then drop the old (zero-downtime rename).
Gotcha 6 — DROP COLUMN is irreversible.
-
The bug —
DROP COLUMNremoves the data immediately on most engines; noUNDROP. - Fix — take a backup or snapshot first; verify the column is truly unused via query log / application audit.
Gotcha 7 — Missing index on FK columns kills join performance.
-
The behaviour —
ON DELETE CASCADEand JOINs both scan the child table by FK; without an index, every parent operation isΘ(n). - Symptom — small table joins suddenly become slow as the child table grows.
- Fix — index every FK column at table creation time; this is the single highest-impact "indexing free win" in schema design.
SQL
Topic — database
DDL migration drills
SQL
Topic — indexing
Index + migration patterns
Choosing the right DDL pattern (cheat sheet)
A one-screen cheat sheet for using SQL CREATE TABLE and ALTER TABLE — pick the pattern that matches your intent.
| You want to … | DDL | Notes |
|---|---|---|
| Create a table with a surrogate PK | CREATE TABLE t (id BIGSERIAL PRIMARY KEY, …) |
Postgres BIGSERIAL; MySQL BIGINT AUTO_INCREMENT; SQL Server IDENTITY(1,1)
|
| Add a foreign key | customer_id BIGINT REFERENCES customers (customer_id) |
Use ON DELETE RESTRICT by default |
| Add a check constraint | CHECK (amount > 0) |
Name it: CONSTRAINT chk_amount
|
| Add a default | status VARCHAR(20) DEFAULT 'pending' |
Pairs with NOT NULL
|
| Auto-populate created_at | created_at TIMESTAMPTZ DEFAULT NOW() |
MySQL CURRENT_TIMESTAMP
|
| Add a new column safely | ALTER TABLE t ADD COLUMN c <type> |
Add nullable, backfill, then SET NOT NULL
|
| Drop a column | ALTER TABLE t DROP COLUMN c |
Irreversible; back up first |
| Rename a column | ALTER TABLE t RENAME COLUMN old TO new |
Metadata-only; coordinate with code |
| Change a column's type |
ALTER TABLE t ALTER COLUMN c TYPE <new_type> USING … (PG) |
Rewrites the table; prefer new column + backfill |
| Add an index without locking |
CREATE INDEX CONCURRENTLY (PG) / ALGORITHM=INPLACE, LOCK=NONE (MySQL) |
Production-safe |
| Add NOT NULL safely (PG) |
ADD CHECK (col IS NOT NULL) NOT VALID → VALIDATE CONSTRAINT → SET NOT NULL
|
Avoids long-lock scan |
| Create a composite unique key |
UNIQUE (col_a, col_b) at table-level |
Multi-column natural keys |
| Add a generated column | total NUMERIC GENERATED ALWAYS AS (qty * price) STORED |
Postgres / MySQL / SQL Server |
| Create a partition by range | CREATE TABLE t (…) PARTITION BY RANGE (created_at) |
For very large fact tables |
Frequently asked questions
What's the difference between PRIMARY KEY and UNIQUE in SQL?
PRIMARY KEY uniquely identifies a row in a table; it's implicitly NOT NULL and UNIQUE, you can only have one per table, and the engine automatically creates an index on its columns. UNIQUE simply enforces "no two rows have the same value in this column (or column combination)" — but the column can be NULL (in most dialects, multiple NULLs are allowed because NULL = NULL is UNKNOWN), and a table can have many UNIQUE constraints. The canonical interview pattern: surrogate PRIMARY KEY (auto-increment BIGSERIAL / IDENTITY) for internal joins, plus business UNIQUE constraints (like email) for natural-key lookups. This separation gives you stable join keys and human-friendly uniqueness in one schema.
What's the difference between DROP TABLE, TRUNCATE TABLE, and DELETE FROM?
These are three different operators with similar-looking effects. DELETE FROM t WHERE … removes rows that match the predicate — row-by-row, fully logged, supports rollback in a transaction. TRUNCATE TABLE t removes every row in one fast minimally-logged operation, usually resets identity sequences, doesn't fire row-level triggers, and (in PostgreSQL / SQL Server) can be rolled back inside a transaction but NOT in MySQL InnoDB. DROP TABLE t removes the table itself — rows, schema, indexes, constraints — and is irreversible without a backup. See Blog79 §4 for the full comparison; the short answer is: DELETE for row-level with rollback, TRUNCATE for fast whole-table reset, DROP for removing the table definition.
How do I add a column to an existing table without downtime?
The safe alter table sql pattern depends on the dialect and the column's nullability. In PostgreSQL 11+, ALTER TABLE t ADD COLUMN c <type> DEFAULT <constant> is metadata-only — instant and non-blocking. For NOT NULL columns, add the column as nullable, backfill in batches (UPDATE … WHERE id BETWEEN x AND y), then convert to NOT NULL using ADD CHECK (col IS NOT NULL) NOT VALID followed by VALIDATE CONSTRAINT and finally SET NOT NULL (avoids the long-lock full-table scan). In MySQL 8, ALTER TABLE … ADD COLUMN …, ALGORITHM=INPLACE, LOCK=NONE is the equivalent non-blocking shape. In SQL Server, adding a nullable column is metadata-only; adding NOT NULL with DEFAULT rewrites the table — use the same nullable-then-backfill pattern.
When should I add a foreign key vs handle the relationship in application code?
Add a FOREIGN KEY whenever the data has a true referential relationship that the engine should enforce. The benefits are real: orphaned rows become impossible, JOINs have well-defined cardinality, and the FK column gets indexed (which speeds up the join in both directions). The cost is a small per-insert validation overhead plus the operational tax of ON DELETE cascading rules — use ON DELETE RESTRICT by default to make accidental data loss impossible, and reach for CASCADE only when business semantics demand it. The only reasons to skip FKs are very-high-throughput OLTP where the validation cost is measurable, or multi-database architectures where the parent and child live in different databases (FKs can't cross database boundaries). For 95% of data-engineering work, declare the FK at table creation time and let the engine enforce it.
What index types should I use for a typical OLTP table?
The default is B-tree — 95% of indexes in a typical OLTP table are B-trees and the choice is right. Index every primary key (automatic), every foreign key column (always — this is the single highest-impact "free win" in schema design), and every column that appears regularly in WHERE, JOIN, or ORDER BY. For multi-column queries, prefer a composite index matching the leftmost-prefix of the query pattern (INDEX (region, created_at) accelerates WHERE region = ? AND created_at > ?). Use partial / filtered indexes for queries that always filter on a constant (WHERE status = 'active') — the index is smaller and faster. Reach for GIN (Postgres) / FULLTEXT (MySQL) for JSON or text search; BRIN (Postgres) for time-series fact tables that are mostly append-only; CLUSTERED COLUMNSTORE (SQL Server) for analytics workloads. Don't over-index: every index adds write overhead, so profile first, add indexes second.
Practice on PipeCode
PipeCode ships 450+ data-engineering interview problems — including PostgreSQL-first SQL practice keyed to DDL (CREATE TABLE, ALTER TABLE), constraints (PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, NOT NULL), indexing, migration safety, and the dialect quirks that fail candidates who memorise only one engine. Whether you're drilling sql interview questions with answers for sql for data engineers loops or grinding through data engineering interview questions end-to-end, the practice library mirrors the same constraint-first schema-design mental model this guide teaches.
Kick off via Explore practice →; drill the database SQL lane →; fan out into the indexing SQL lane →; rehearse data-manipulation patterns →; reinforce type-handling SQL drills →; widen coverage on the full SQL practice library →.





Top comments (0)