Gowtham Potureddi

Posted on May 23

CREATE TABLE & ALTER TABLE in SQL: Schema Design for Data Engineers

#sql #python #interview #dataengineering

sql create table defines the shape of every relation in a database — columns and their types, primary keys and unique constraints, foreign-key references, NOT NULL guarantees, DEFAULT values, and CHECK predicates that the engine enforces on every write. sql alter table is how that shape changes after launch — adding columns, dropping columns, modifying types, renaming tables, attaching new constraints. Together they form sql ddl (Data Definition Language), the sql schema design vocabulary every sql interview questions panel tests when the conversation moves past SELECT and into "how would you model this."

This guide walks through every clause in the create table sql family that reviewers love to test in data engineering interview questions: the full CREATE TABLE signature with column types and constraints, the PRIMARY KEY / FOREIGN KEY / NOT NULL / UNIQUE / CHECK constraint catalogue, DEFAULT values and computed columns, the alter table sql vocabulary (ADD COLUMN, DROP COLUMN, ALTER COLUMN, RENAME, ADD CONSTRAINT), create index sql patterns paired with table design, online-migration safety on production tables, and the seven gotchas (locking, type-change rewrites, dropped indexes, FK cascade chains) that fail most candidates. Every section ends as sql interview questions with answers: a runnable PostgreSQL DDL plus DML query, a traced execution, an output table, and a concept-by-concept why this works breakdown — the exact shape sql for data engineers rounds reward when ddl in sql comes up.

When you want hands-on reps immediately after reading, browse SQL practice library →, drill the database problems →, sharpen indexing SQL drills →, rehearse data-manipulation patterns →, or widen coverage on the full SQL practice library →.

On this page

Why CREATE TABLE and ALTER TABLE matter in data engineering interviews
Anatomy of CREATE TABLE — columns, types, and the constraint catalogue
PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, NOT NULL
DEFAULT values, computed columns, and identity / SERIAL
ALTER TABLE — ADD, DROP, MODIFY, RENAME column patterns
Indexes paired with table design — CREATE INDEX
Online migration gotchas — locks, type-change rewrites, FK chains
Choosing the right DDL pattern (cheat sheet)
Frequently asked questions
Practice on PipeCode

1. Why CREATE TABLE and ALTER TABLE matter in data engineering interviews

Schema design is what separates juniors from seniors — every constraint is a future bug prevented

The one-sentence invariant: sql create table defines the relation's shape and the constraints the engine enforces on every write; sql alter table evolves that shape over time without dropping the data. Once you internalise that — and learn which constraints belong in the schema vs in the application layer — every prompt in the sql schema design family becomes a question of "what does this data actually require to be valid?"

The CREATE TABLE mental model in three bullets.

Columns + types define the row shape — order_id BIGINT, region VARCHAR(10), amount NUMERIC(12,2), created_at TIMESTAMPTZ.
Constraints define what makes a row valid — PRIMARY KEY, FOREIGN KEY, NOT NULL, UNIQUE, CHECK, DEFAULT.
Indexes are not "constraints" but live alongside the table — PRIMARY KEY and UNIQUE implicitly create them; CREATE INDEX adds them explicitly.

The five constraint types in the catalogue.

PRIMARY KEY — uniquely identifies a row; implicitly NOT NULL + UNIQUE; one per table; the most-asked create table sql constraint in interviews.
FOREIGN KEY — references another table's primary or unique key; enforces referential integrity (ON DELETE CASCADE / RESTRICT / SET NULL cascade rules).
NOT NULL — guarantees the column is never NULL; the cheapest, most-impactful constraint to add.
UNIQUE — guarantees no duplicate values across rows; one or many UNIQUE columns per table.
CHECK (predicate) — engine evaluates the predicate on every write; row must satisfy it.

Why interviewers love DDL questions.

It surfaces data modelling instincts — the right primary key, the right level of normalisation, the right column types.
It tests dialect knowledge — SERIAL (Postgres) vs AUTO_INCREMENT (MySQL) vs IDENTITY (SQL Server); JSONB (Postgres) vs JSON (MySQL) vs VARIANT (Snowflake).
It probes migration safety — does the candidate know that ALTER TABLE ALTER COLUMN TYPE can rewrite a multi-gigabyte table?
It hits operational realism — CREATE INDEX CONCURRENTLY, ALTER TABLE … VALIDATE CONSTRAINT, online DDL on MySQL.

What interviewers listen for.

Do you name the primary key explicitly and explain why? — "I'm using (order_id) because it's the natural unique identifier."
Do you think about NULLability before declaring every column? — "region is NOT NULL because the application never writes an unknown region."
Do you mention FK ON DELETE behaviour when designing relationships? — senior signal.
Do you reach for CREATE INDEX CONCURRENTLY when adding an index on a busy production table? — bonus points (Postgres).

Worked example — design and populate an `orders` table

Detailed explanation. Realistic table design starts from the questions the table will answer ("what's the revenue per region last week?") and the cardinality of the data ("a few million rows a year"). The constraints encode what every row must satisfy; the column types match the natural domain.

Question. Design an orders table holding order_id, customer_id (foreign key), region, amount, status, created_at. Include a primary key, a foreign key to a customers(customer_id) table, NOT NULL on every column, a CHECK that amount > 0, a DEFAULT for status, and an index on (created_at) for time-range queries.

Code (PostgreSQL).

CREATE TABLE orders (
    order_id    BIGSERIAL PRIMARY KEY,
    customer_id BIGINT       NOT NULL
                             REFERENCES customers (customer_id) ON DELETE RESTRICT,
    region      VARCHAR(10)  NOT NULL,
    amount      NUMERIC(12,2) NOT NULL CHECK (amount > 0),
    status      VARCHAR(20)  NOT NULL DEFAULT 'pending',
    created_at  TIMESTAMPTZ  NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_orders_created_at ON orders (created_at);
CREATE INDEX idx_orders_customer   ON orders (customer_id);

Step-by-step explanation.

order_id BIGSERIAL PRIMARY KEY — auto-incrementing 8-byte integer primary key.
customer_id with REFERENCES customers (customer_id) ON DELETE RESTRICT — FK that blocks deleting a customer who has orders.
region, amount, status, created_at — all NOT NULL to make missing values impossible.
CHECK (amount > 0) — engine refuses any insert / update that violates the predicate.
DEFAULT 'pending' and DEFAULT NOW() — sensible auto-values when omitted.
Two explicit CREATE INDEX statements for the common access patterns (time range and customer-scoped queries).

Output (after one insert).

order_id	customer_id	region	amount	status	created_at
1	42	US	100.00	pending	2026-05-23 14:30:00+00

Rule of thumb: design the table from the queries it will serve, not from the source data's shape; constraints are cheaper to add at CREATE TABLE time than after launch.

SQL
Topic — database
Schema design SQL drills

Practice →

SQL
Topic — sql
SQL practice library

Practice →

2. Anatomy of CREATE TABLE — columns, types, and the constraint catalogue

`create table sql` — the full signature, slice by slice

Before any specific constraint, the mental model: a CREATE TABLE statement is a slice-by-slice declaration of column-by-column shape, terminated by optional table-level constraints. Read it top-to-bottom and you've described every guarantee the engine will enforce.

The full CREATE TABLE signature.

CREATE TABLE <table_name> (
    <col_name> <data_type> [<column_constraints>],
    <col_name> <data_type> [<column_constraints>],
    …,
    [<table_constraints>]
);

Table name — schema-qualified (schema.table) or unqualified (uses the default schema).
Column definitions — type plus zero or more column-level constraints.
Table-level constraints — composite PK, FK, UNIQUE, CHECK spanning multiple columns.

Common data types — the cross-dialect cheat.

Integer — INTEGER (32-bit), BIGINT (64-bit), SMALLINT (16-bit); the universal numeric tier.
Decimal — NUMERIC(p, s) / DECIMAL(p, s) — fixed precision for currency; p total digits, s scale.
Float — REAL (single), DOUBLE PRECISION (double); for scientific computation, not money.
String — VARCHAR(n) (max length n), TEXT (unbounded in Postgres / MySQL).
Date / time — DATE, TIMESTAMP, TIMESTAMPTZ (Postgres) / DATETIME (MySQL / SQL Server). See Blog74.
Boolean — BOOLEAN (Postgres / MySQL 8), BIT (SQL Server).
JSON — JSONB (Postgres, indexed), JSON (MySQL), VARIANT (Snowflake).
UUID — UUID (Postgres native), UNIQUEIDENTIFIER (SQL Server), CHAR(36) portable.

Column-level vs table-level constraints.

Column-level — apply to one column, declared inline: email VARCHAR(255) UNIQUE NOT NULL.
Table-level — apply to one or many columns, declared after the column list: UNIQUE (col_a, col_b), PRIMARY KEY (col_a, col_b).
Composite keys require table-level form — PRIMARY KEY (region, customer_id) is a two-column PK.
Named constraints — CONSTRAINT chk_amount_positive CHECK (amount > 0) — gives the constraint a name you can reference in ALTER TABLE … DROP CONSTRAINT.

Optional clauses around CREATE TABLE.

CREATE TABLE IF NOT EXISTS — idempotent; useful in migration scripts.
CREATE TEMPORARY TABLE — session-scoped table dropped on session end.
CREATE TABLE … AS SELECT (CTAS) — create and populate in one statement; column types inferred from the source query.
CREATE TABLE … (LIKE other_table) — copy schema from another table (Postgres / MySQL).
CREATE TABLE … PARTITION BY RANGE / LIST / HASH — partitioned tables for very large fact tables.

Dialect divergence — the things every interview tests.

Auto-increment — SERIAL / BIGSERIAL / IDENTITY (Postgres 10+); AUTO_INCREMENT (MySQL); IDENTITY(1,1) (SQL Server); SEQUENCE (Oracle).
String types — TEXT is preferred in Postgres (no penalty vs VARCHAR); MySQL has VARCHAR(n) size matter for index byte-budget.
Boolean — Postgres / MySQL 8 native BOOLEAN; SQL Server uses BIT (0/1).
JSON — Postgres JSONB is fully indexed; MySQL JSON is binary but indexable via generated columns.
Timestamps with TZ — Postgres TIMESTAMPTZ; Snowflake TIMESTAMP_TZ; SQL Server DATETIMEOFFSET. See Blog74 §2.

SQL
Topic — database
CREATE TABLE drills

Practice →

SQL
Topic — type-handling
Data-type SQL practice

Practice →

3. PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, NOT NULL

`sql constraints` — five rules the engine enforces on every write

The five constraint types are the heart of sql schema design. Each one is a guarantee the database makes about every row that lands in the table — and every guarantee is a class of bug the application layer no longer needs to defend against.

PRIMARY KEY — the canonical identifier.

Signature — <col> <type> PRIMARY KEY (column-level) or PRIMARY KEY (col_a, col_b) (table-level for composite).
Implicit guarantees — NOT NULL and UNIQUE.
One per table — there can only be one PRIMARY KEY constraint.
Implicit index — every PRIMARY KEY automatically creates an index on its columns.
Choice of key type — surrogate (auto-increment BIGSERIAL / IDENTITY) vs natural (an existing unique business value like email); surrogate is the safer default.

FOREIGN KEY — referential integrity.

Column-level — customer_id BIGINT REFERENCES customers (customer_id).
Table-level — FOREIGN KEY (customer_id) REFERENCES customers (customer_id).
ON DELETE actions — CASCADE (delete dependent rows), RESTRICT / NO ACTION (block the parent delete), SET NULL (null out the FK), SET DEFAULT (revert to the column's default).
ON UPDATE actions — same options for parent-key updates.
Performance cost — every insert into the child table checks the parent for the referenced row; ensure the referenced column is indexed (it is by default if it's the parent's PK).
sql foreign key is one of the most-asked create table sql interview prompts.

NOT NULL — the cheapest, most-impactful constraint.

Inline — region VARCHAR(10) NOT NULL.
Default behaviour — columns are nullable unless NOT NULL is declared.
Catches missing values upstream — better than discovering NULLs in downstream reports.
Almost-always-on rule — columns should be NOT NULL unless missing data is semantically meaningful.

UNIQUE — no duplicates allowed.

Column-level — email VARCHAR(255) UNIQUE.
Table-level — UNIQUE (col_a, col_b) for composite uniqueness.
Implicit index — every UNIQUE constraint creates an index.
NULL exception — most engines allow multiple NULLs in a UNIQUE column (because NULL = NULL is UNKNOWN); SQL Server treats NULL as a single value for UNIQUE purposes.
The natural-key pattern — when a business value (like email) is already unique, add a UNIQUE constraint on top of the surrogate PRIMARY KEY.

CHECK (predicate) — domain-level validation.

Column-level — amount NUMERIC(12,2) CHECK (amount > 0).
Table-level — CHECK (start_date < end_date) referencing multiple columns.
Engine evaluates on every insert / update — refuses the row if the predicate is FALSE (returns UNKNOWN for NULL).
Use cases — non-negative amounts, valid status enum values, sane date ranges, length bounds.
MySQL caveat — supports CHECK syntax since 5.7 but only enforces it since 8.0.16.

DEFAULT <expr> — implicit value on omitted insert.

Inline — status VARCHAR(20) NOT NULL DEFAULT 'pending'.
Common patterns — DEFAULT NOW() (Postgres) / DEFAULT CURRENT_TIMESTAMP (MySQL / SQL Server) for created_at columns.
Generated default — DEFAULT uuid_generate_v4() for UUID PKs.
Pairs cleanly with NOT NULL — the engine fills in the value when the insert omits the column.

sql primary key vs foreign key — the interview comparison.

PRIMARY KEY — uniquely identifies rows in this table.
FOREIGN KEY — references the primary (or unique) key of another table.
PK is local, FK is referential — that single sentence is the textbook interview answer.

SQL interview question — orders + customers schema with FK and cascading rules

Assume you need customers(customer_id, email, name, status) and orders(order_id, customer_id, amount, status, created_at). Design both tables with primary keys, a foreign key from orders.customer_id to customers.customer_id, an FK action that blocks deleting a customer who has orders, and constraints that prevent invalid data.

Solution Using CREATE TABLE + named constraints

Code (PostgreSQL).

CREATE TABLE customers (
    customer_id BIGSERIAL    PRIMARY KEY,
    email       VARCHAR(255) NOT NULL UNIQUE,
    name        VARCHAR(100) NOT NULL,
    status      VARCHAR(20)  NOT NULL DEFAULT 'active'
                             CHECK (status IN ('active', 'suspended', 'closed'))
);

CREATE TABLE orders (
    order_id    BIGSERIAL     PRIMARY KEY,
    customer_id BIGINT        NOT NULL,
    amount      NUMERIC(12,2) NOT NULL CHECK (amount > 0),
    status      VARCHAR(20)   NOT NULL DEFAULT 'pending'
                              CHECK (status IN ('pending', 'paid', 'refund', 'cancel')),
    created_at  TIMESTAMPTZ   NOT NULL DEFAULT NOW(),
    CONSTRAINT fk_orders_customer
        FOREIGN KEY (customer_id) REFERENCES customers (customer_id)
        ON DELETE RESTRICT
);

CREATE INDEX idx_orders_customer ON orders (customer_id);
CREATE INDEX idx_orders_created  ON orders (created_at);

Step-by-step trace.

step	object	outcome
1	`customers` PK + UNIQUE on email	natural-key uniqueness plus surrogate PK
2	`customers.status CHECK`	only the three enum values allowed
3	`orders` PK + named FK + CHECKs	full referential integrity plus domain checks
4	`ON DELETE RESTRICT`	prevents deleting a customer who has any order
5	Two indexes on `orders`	match the common query access patterns

Output: the schema enforces every invariant at write time; the application never has to defend against unknown statuses or orphaned orders.

Why this works — concept by concept:

BIGSERIAL surrogate PK — auto-incrementing 8-byte integer; the safe default unless the data has a natural key.
UNIQUE on customers.email — enforces business-level uniqueness while the surrogate handles internal joins.
CHECK status IN (…) — engine refuses any insert / update with an unknown status (cheaper than application-layer validation).
FK ON DELETE RESTRICT — blocks the parent delete; the senior alternative to silent cascade chains.
Named FK constraint — CONSTRAINT fk_orders_customer lets you ALTER TABLE … DROP CONSTRAINT later if needed.
Two explicit indexes — match real query access patterns (customer-scoped and time-range queries).
Cost — index maintenance is Θ(log n) per insert; the constraint enforcement is constant per row.

SQL
Topic — database
Constraints SQL drills

Practice →

SQL
Topic — sql
Foreign-key SQL patterns

Practice →

4. DEFAULT values, computed columns, and identity / SERIAL

`default value sql` and auto-generated columns — let the engine fill in what you don't write

DEFAULT is the unsung hero of schema design. Every column that has a "sensible value when missing" should have a DEFAULT clause — created_at defaults to NOW(), status defaults to 'pending', surrogate IDs default to the next sequence value. The pattern eliminates entire classes of "I forgot to set this column" bugs.

DEFAULT <literal> — the simplest form.

String — status VARCHAR(20) NOT NULL DEFAULT 'pending'.
Number — retry_count INTEGER NOT NULL DEFAULT 0.
Boolean — is_active BOOLEAN NOT NULL DEFAULT TRUE.
NULL — optional_col VARCHAR(50) DEFAULT NULL (rarely needed; the absence of NOT NULL already permits NULL).

DEFAULT <expression> — dynamic defaults.

Current timestamp — created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() (Postgres) / DEFAULT CURRENT_TIMESTAMP (MySQL / SQL Server).
UUID — id UUID PRIMARY KEY DEFAULT gen_random_uuid() (Postgres 13+) / DEFAULT (UUID()) (MySQL 8).
Sequence — order_id BIGINT NOT NULL DEFAULT nextval('orders_seq') (Postgres explicit sequence).

Auto-increment / IDENTITY / SERIAL — surrogate primary keys.

PostgreSQL — BIGSERIAL (legacy) and BIGINT GENERATED ALWAYS AS IDENTITY (SQL standard, since PG 10).
MySQL — BIGINT AUTO_INCREMENT PRIMARY KEY.
SQL Server — BIGINT IDENTITY(1,1) PRIMARY KEY.
Oracle — NUMBER GENERATED BY DEFAULT AS IDENTITY or a SEQUENCE + trigger.
Snowflake — NUMBER AUTOINCREMENT START 1 INCREMENT 1.

The choice of integer type for the PK.

INTEGER (32-bit, max 2.1B) — safe for moderate-volume tables; will overflow on very-large-scale OLTP.
BIGINT (64-bit, max 9.2 × 10¹⁸) — the safe default for new tables; effectively never overflows.
UUID — globally unique without coordination; larger (16 bytes vs 8); harder to index; useful for distributed systems.
The senior rule of thumb — BIGINT surrogate PK + business-key UNIQUE constraint.

Computed / generated columns — values derived from other columns.

PostgreSQL — total NUMERIC GENERATED ALWAYS AS (quantity * unit_price) STORED.
MySQL — total NUMERIC AS (quantity * unit_price) STORED (or VIRTUAL for non-materialised).
SQL Server — total AS (quantity * unit_price) PERSISTED.
Use cases — denormalised aggregates, JSON path extraction ((payload->>'amount')::numeric), uppercase-shadow columns for case-insensitive search.

identity column sql — the modern SQL-standard form.

<col> <type> GENERATED ALWAYS AS IDENTITY — the column is always system-generated; manual inserts to it error.
<col> <type> GENERATED BY DEFAULT AS IDENTITY — auto-generated unless an explicit value is provided.
Supported in — Postgres 10+, Oracle 12c+, DB2; gradually replacing dialect-specific syntax.

Common DEFAULT pitfalls.

DEFAULT with non-deterministic functions — NOW() is fine (evaluated at insert time), but RANDOM() may evaluate once at definition (dialect-specific).
DEFAULT does not back-fill existing rows — adding DEFAULT to a column doesn't change existing rows; you need UPDATE … SET col = DEFAULT WHERE col IS NULL.
Sequence "holes" — auto-increment values are consumed on every insert attempt (even failed ones); holes are normal and benign.
AUTO_INCREMENT resets on TRUNCATE — confirm this is desired before truncating tables with FK references.

SQL
Topic — database
DEFAULT / IDENTITY drills

Practice →

SQL
Topic — sql
Generated columns SQL library

Practice →

5. ALTER TABLE — ADD, DROP, MODIFY, RENAME column patterns

`alter table sql` — evolve the schema without dropping the data

sql alter table is the vocabulary for evolving a table after launch. Every action (ADD COLUMN, DROP COLUMN, ALTER COLUMN TYPE, RENAME) has its own cost profile, and knowing which ones rewrite the whole table vs which ones just patch metadata is what separates the senior data engineer from the junior who locks a 100GB table for an hour.

ALTER TABLE … ADD COLUMN — the most common evolution.

Signature — ALTER TABLE t ADD COLUMN <col_name> <type> [<constraints>].
Postgres 11+ — ADD COLUMN with a constant DEFAULT is metadata-only (no rewrite); a non-constant default rewrites.
MySQL 8 online DDL — ALTER TABLE … ADD COLUMN, ALGORITHM=INPLACE, LOCK=NONE for non-blocking adds.
SQL Server — adding a NULL column is metadata-only; adding NOT NULL with DEFAULT rewrites.
Safe pattern — add the column as nullable first, backfill, then add NOT NULL.

ALTER TABLE … DROP COLUMN — usually fast.

Signature — ALTER TABLE t DROP COLUMN <col_name>.
Postgres — metadata-only; the column data isn't reclaimed until VACUUM FULL or pg_repack.
MySQL — usually requires a table rewrite (algorithm-dependent).
SQL Server — metadata-only by default.
The risk — DROP COLUMN is irreversible without a backup; never drop columns the application still references.

ALTER TABLE … ALTER COLUMN TYPE — the expensive one.

Signature — ALTER TABLE t ALTER COLUMN c TYPE <new_type> (Postgres) / MODIFY COLUMN c <new_type> (MySQL) / ALTER COLUMN c <new_type> (SQL Server).
Cost — typically requires a full table rewrite (every row's bytes change); locks the table for the duration on most engines.
USING <expression> — Postgres lets you specify how to convert the old values: ALTER COLUMN amount TYPE NUMERIC USING amount::numeric.
Safe pattern — add a new column, backfill with the converted values, swap the application reads, drop the old column.

ALTER TABLE … RENAME COLUMN and RENAME TO.

RENAME COLUMN old_name TO new_name — metadata-only; near-instant on every dialect.
RENAME TO new_table_name — also metadata-only; near-instant.
The catch — every view / stored procedure / application query referencing the old name breaks immediately; coordinate the rename with code deploys.

ALTER TABLE … ADD CONSTRAINT / DROP CONSTRAINT.

Add FK — ALTER TABLE orders ADD CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers (customer_id).
Add CHECK — ALTER TABLE orders ADD CONSTRAINT chk_amount CHECK (amount > 0).
Add UNIQUE — ALTER TABLE users ADD CONSTRAINT uq_users_email UNIQUE (email).
Drop by name — ALTER TABLE orders DROP CONSTRAINT fk_orders_customer — always name your constraints up front.
Postgres NOT VALID — ALTER TABLE … ADD CONSTRAINT … CHECK (…) NOT VALID adds the constraint without validating existing rows; you VALIDATE CONSTRAINT later when convenient (skips the table-scan lock).

ALTER TABLE … ALTER COLUMN SET NOT NULL.

Postgres — table scan to verify no existing NULL; can be slow on large tables.
Workaround — ADD CHECK (col IS NOT NULL) NOT VALID, then VALIDATE CONSTRAINT, then convert to NOT NULL (avoids the long lock).
SQL Server / MySQL — same effective scan; same caveats.

ALTER TABLE actions by dialect — the cheat-bullet.

PostgreSQL — ADD COLUMN, DROP COLUMN, ALTER COLUMN, RENAME COLUMN, RENAME CONSTRAINT, ADD CONSTRAINT, DROP CONSTRAINT, SET / DROP DEFAULT, SET / DROP NOT NULL.
MySQL — ADD COLUMN, DROP COLUMN, MODIFY COLUMN, CHANGE COLUMN (rename + retype), RENAME COLUMN, RENAME TO, ADD INDEX, DROP INDEX.
SQL Server — ADD <col>, DROP COLUMN, ALTER COLUMN, sp_rename (for column / table rename), ADD CONSTRAINT, DROP CONSTRAINT.
Oracle — same actions; MODIFY keyword for type changes.
Snowflake — supports most actions; some types (VARIANT, OBJECT) have specific evolution paths.

SQL
Topic — database
ALTER TABLE SQL drills

Practice →

SQL
Topic — sql
Schema migration patterns

Practice →

6. Indexes paired with table design — CREATE INDEX

`create index sql` — every access pattern deserves an index

Indexes are not "constraints" but they live inside the same schema-design conversation. Every query that hits a column in WHERE, JOIN, or ORDER BY either uses an index or scans the table; the difference between sub-millisecond and multi-second response time is whether the right indexes exist.

The CREATE INDEX signature.

B-tree (default) — CREATE INDEX idx_orders_created ON orders (created_at).
Composite — CREATE INDEX idx_orders_customer_created ON orders (customer_id, created_at).
Unique — CREATE UNIQUE INDEX uq_users_email ON users (email) — implicitly enforces uniqueness, like a UNIQUE constraint.
Partial / Filtered — CREATE INDEX idx_active_orders ON orders (created_at) WHERE status = 'active' (Postgres) / WHERE clause syntax (SQL Server filtered indexes).
Functional / Expression — CREATE INDEX idx_users_lower_email ON users (LOWER(email)) for case-insensitive lookup.

Index types beyond B-tree.

PostgreSQL — BTREE, HASH, GIN (for JSONB / array), GIST (geometry / range types), BRIN (block-range for huge sorted tables), SP-GIST.
MySQL — BTREE, HASH (memory engine only), FULLTEXT (text search), SPATIAL (geometry).
SQL Server — CLUSTERED (defines the row's physical order; one per table), NONCLUSTERED (default), COLUMNSTORE (analytics).
The interview-canonical answer — B-tree is the default and right choice 95% of the time; reach for others only when the query pattern demands it.

CREATE INDEX CONCURRENTLY (Postgres) — non-blocking index build.

Builds the index without locking the table for writes — at the cost of a longer total build time (two passes).
Required for production index adds on busy tables.
MySQL equivalent — ALTER TABLE … ADD INDEX, ALGORITHM=INPLACE, LOCK=NONE.
SQL Server equivalent — CREATE INDEX … WITH (ONLINE = ON).

Index selection rules of thumb.

Every FK column should be indexed — speeds up parent-delete checks and child-scan joins.
Every column in WHERE, JOIN, ORDER BY is a candidate; profile before adding more.
Composite indexes match leftmost-prefix queries — INDEX (a, b) accelerates WHERE a = ?, WHERE a = ? AND b = ?, but NOT WHERE b = ? alone.
Don't over-index OLTP tables — every index slows down INSERT / UPDATE / DELETE proportionally.
Cover columns in the INCLUDE clause (Postgres 11+ / SQL Server) — avoid the heap-fetch step for index-only scans.

Common index pitfalls.

Function-wrapped columns defeat the index — WHERE LOWER(email) = '…' won't use INDEX (email) unless the index is on LOWER(email).
Implicit type coercion defeats the index — WHERE int_col = '42' may not use the int index. See Blog78.
Indexing low-cardinality columns is rarely useful — WHERE status = 'paid' over a 50/50 distribution; the planner often picks a sequential scan instead.
Stale stats — after a big load, run ANALYZE (Postgres) or UPDATE STATISTICS (SQL Server) so the planner picks the right plan.

SQL
Topic — indexing
Indexing SQL drills

Practice →

SQL
Topic — database
Index + schema patterns

Practice →

7. Online migration gotchas — locks, type-change rewrites, FK chains

The seven bugs interviewers test most often on DDL changes

DDL has a small surface area but a long tail of edge cases that fail candidates — especially when the conversation moves from "design a new table" to "evolve a 100GB production table without downtime."

Gotcha 1 — ALTER TABLE ALTER COLUMN TYPE rewrites the whole table.

The bug — ALTER COLUMN amount TYPE NUMERIC(14, 2) may hold an exclusive lock and rewrite every row for the duration.
Symptom — production app blocked for minutes to hours.
Fix — add a new column, backfill in batches, swap reads, drop the old column.

Gotcha 2 — adding NOT NULL requires a full table scan.

The bug — ALTER COLUMN … SET NOT NULL scans every row to confirm none are NULL; holds a lock for the duration.
Fix (Postgres) — ADD CHECK (col IS NOT NULL) NOT VALID → VALIDATE CONSTRAINT (background) → drop the check + SET NOT NULL after the column is known clean.

Gotcha 3 — CREATE INDEX (without CONCURRENTLY) locks writes.

The bug — CREATE INDEX takes an exclusive write lock on the table for the duration.
Fix — CREATE INDEX CONCURRENTLY (Postgres) or ALGORITHM=INPLACE, LOCK=NONE (MySQL) or ONLINE = ON (SQL Server).

Gotcha 4 — ON DELETE CASCADE propagates further than expected.

The bug — deleting one customer cascades through orders, payments, reviews, audit-log entries.
Symptom — single-row delete takes seconds and wipes out unrelated data.
Fix — use ON DELETE RESTRICT by default; soft-delete via a deleted_at column; explicit cascade only when business semantics require it.

Gotcha 5 — RENAME COLUMN breaks every dependent view / proc.

The bug — the rename is metadata-only and fast, but every view, stored procedure, application query referencing the old name fails immediately.
Fix — coordinate the rename with the code deploy; or add the new column, dual-write, then drop the old (zero-downtime rename).

Gotcha 6 — DROP COLUMN is irreversible.

The bug — DROP COLUMN removes the data immediately on most engines; no UNDROP.
Fix — take a backup or snapshot first; verify the column is truly unused via query log / application audit.

Gotcha 7 — Missing index on FK columns kills join performance.

The behaviour — ON DELETE CASCADE and JOINs both scan the child table by FK; without an index, every parent operation is Θ(n).
Symptom — small table joins suddenly become slow as the child table grows.
Fix — index every FK column at table creation time; this is the single highest-impact "indexing free win" in schema design.

SQL
Topic — database
DDL migration drills

Practice →

SQL
Topic — indexing
Index + migration patterns

Practice →

Choosing the right DDL pattern (cheat sheet)

A one-screen cheat sheet for using SQL CREATE TABLE and ALTER TABLE — pick the pattern that matches your intent.

You want to …	DDL	Notes
Create a table with a surrogate PK	`CREATE TABLE t (id BIGSERIAL PRIMARY KEY, …)`	Postgres `BIGSERIAL`; MySQL `BIGINT AUTO_INCREMENT`; SQL Server `IDENTITY(1,1)`
Add a foreign key	`customer_id BIGINT REFERENCES customers (customer_id)`	Use `ON DELETE RESTRICT` by default
Add a check constraint	`CHECK (amount > 0)`	Name it: `CONSTRAINT chk_amount`
Add a default	`status VARCHAR(20) DEFAULT 'pending'`	Pairs with `NOT NULL`
Auto-populate created_at	`created_at TIMESTAMPTZ DEFAULT NOW()`	MySQL `CURRENT_TIMESTAMP`
Add a new column safely	`ALTER TABLE t ADD COLUMN c <type>`	Add nullable, backfill, then `SET NOT NULL`
Drop a column	`ALTER TABLE t DROP COLUMN c`	Irreversible; back up first
Rename a column	`ALTER TABLE t RENAME COLUMN old TO new`	Metadata-only; coordinate with code
Change a column's type	`ALTER TABLE t ALTER COLUMN c TYPE <new_type> USING …` (PG)	Rewrites the table; prefer new column + backfill
Add an index without locking	`CREATE INDEX CONCURRENTLY` (PG) / `ALGORITHM=INPLACE, LOCK=NONE` (MySQL)	Production-safe
Add NOT NULL safely (PG)	`ADD CHECK (col IS NOT NULL) NOT VALID` → `VALIDATE CONSTRAINT` → `SET NOT NULL`	Avoids long-lock scan
Create a composite unique key	`UNIQUE (col_a, col_b)` at table-level	Multi-column natural keys
Add a generated column	`total NUMERIC GENERATED ALWAYS AS (qty * price) STORED`	Postgres / MySQL / SQL Server
Create a partition by range	`CREATE TABLE t (…) PARTITION BY RANGE (created_at)`	For very large fact tables

Frequently asked questions

What's the difference between PRIMARY KEY and UNIQUE in SQL?

PRIMARY KEY uniquely identifies a row in a table; it's implicitly NOT NULL and UNIQUE, you can only have one per table, and the engine automatically creates an index on its columns. UNIQUE simply enforces "no two rows have the same value in this column (or column combination)" — but the column can be NULL (in most dialects, multiple NULLs are allowed because NULL = NULL is UNKNOWN), and a table can have many UNIQUE constraints. The canonical interview pattern: surrogate PRIMARY KEY (auto-increment BIGSERIAL / IDENTITY) for internal joins, plus business UNIQUE constraints (like email) for natural-key lookups. This separation gives you stable join keys and human-friendly uniqueness in one schema.

What's the difference between DROP TABLE, TRUNCATE TABLE, and DELETE FROM?

These are three different operators with similar-looking effects. DELETE FROM t WHERE … removes rows that match the predicate — row-by-row, fully logged, supports rollback in a transaction. TRUNCATE TABLE t removes every row in one fast minimally-logged operation, usually resets identity sequences, doesn't fire row-level triggers, and (in PostgreSQL / SQL Server) can be rolled back inside a transaction but NOT in MySQL InnoDB. DROP TABLE t removes the table itself — rows, schema, indexes, constraints — and is irreversible without a backup. See Blog79 §4 for the full comparison; the short answer is: DELETE for row-level with rollback, TRUNCATE for fast whole-table reset, DROP for removing the table definition.

How do I add a column to an existing table without downtime?

The safe alter table sql pattern depends on the dialect and the column's nullability. In PostgreSQL 11+, ALTER TABLE t ADD COLUMN c <type> DEFAULT <constant> is metadata-only — instant and non-blocking. For NOT NULL columns, add the column as nullable, backfill in batches (UPDATE … WHERE id BETWEEN x AND y), then convert to NOT NULL using ADD CHECK (col IS NOT NULL) NOT VALID followed by VALIDATE CONSTRAINT and finally SET NOT NULL (avoids the long-lock full-table scan). In MySQL 8, ALTER TABLE … ADD COLUMN …, ALGORITHM=INPLACE, LOCK=NONE is the equivalent non-blocking shape. In SQL Server, adding a nullable column is metadata-only; adding NOT NULL with DEFAULT rewrites the table — use the same nullable-then-backfill pattern.

When should I add a foreign key vs handle the relationship in application code?

Add a FOREIGN KEY whenever the data has a true referential relationship that the engine should enforce. The benefits are real: orphaned rows become impossible, JOINs have well-defined cardinality, and the FK column gets indexed (which speeds up the join in both directions). The cost is a small per-insert validation overhead plus the operational tax of ON DELETE cascading rules — use ON DELETE RESTRICT by default to make accidental data loss impossible, and reach for CASCADE only when business semantics demand it. The only reasons to skip FKs are very-high-throughput OLTP where the validation cost is measurable, or multi-database architectures where the parent and child live in different databases (FKs can't cross database boundaries). For 95% of data-engineering work, declare the FK at table creation time and let the engine enforce it.

What index types should I use for a typical OLTP table?

The default is B-tree — 95% of indexes in a typical OLTP table are B-trees and the choice is right. Index every primary key (automatic), every foreign key column (always — this is the single highest-impact "free win" in schema design), and every column that appears regularly in WHERE, JOIN, or ORDER BY. For multi-column queries, prefer a composite index matching the leftmost-prefix of the query pattern (INDEX (region, created_at) accelerates WHERE region = ? AND created_at > ?). Use partial / filtered indexes for queries that always filter on a constant (WHERE status = 'active') — the index is smaller and faster. Reach for GIN (Postgres) / FULLTEXT (MySQL) for JSON or text search; BRIN (Postgres) for time-series fact tables that are mostly append-only; CLUSTERED COLUMNSTORE (SQL Server) for analytics workloads. Don't over-index: every index adds write overhead, so profile first, add indexes second.

Practice on PipeCode

PipeCode ships 450+ data-engineering interview problems — including PostgreSQL-first SQL practice keyed to DDL (CREATE TABLE, ALTER TABLE), constraints (PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, NOT NULL), indexing, migration safety, and the dialect quirks that fail candidates who memorise only one engine. Whether you're drilling sql interview questions with answers for sql for data engineers loops or grinding through data engineering interview questions end-to-end, the practice library mirrors the same constraint-first schema-design mental model this guide teaches.

Kick off via Explore practice →; drill the database SQL lane →; fan out into the indexing SQL lane →; rehearse data-manipulation patterns →; reinforce type-handling SQL drills →; widen coverage on the full SQL practice library →.

DEV Community

CREATE TABLE & ALTER TABLE in SQL: Schema Design for Data Engineers

1. Why CREATE TABLE and ALTER TABLE matter in data engineering interviews

Schema design is what separates juniors from seniors — every constraint is a future bug prevented

Worked example — design and populate an `orders` table

2. Anatomy of CREATE TABLE — columns, types, and the constraint catalogue

`create table sql` — the full signature, slice by slice

3. PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, NOT NULL

`sql constraints` — five rules the engine enforces on every write

SQL interview question — orders + customers schema with FK and cascading rules

Solution Using CREATE TABLE + named constraints

4. DEFAULT values, computed columns, and identity / SERIAL

`default value sql` and auto-generated columns — let the engine fill in what you don't write

5. ALTER TABLE — ADD, DROP, MODIFY, RENAME column patterns

`alter table sql` — evolve the schema without dropping the data

6. Indexes paired with table design — CREATE INDEX

`create index sql` — every access pattern deserves an index

7. Online migration gotchas — locks, type-change rewrites, FK chains

The seven bugs interviewers test most often on DDL changes

Choosing the right DDL pattern (cheat sheet)

Frequently asked questions

What's the difference between PRIMARY KEY and UNIQUE in SQL?

What's the difference between DROP TABLE, TRUNCATE TABLE, and DELETE FROM?

How do I add a column to an existing table without downtime?

When should I add a foreign key vs handle the relationship in application code?

What index types should I use for a typical OLTP table?

Practice on PipeCode

Top comments (0)

1. Why CREATE TABLE and ALTER TABLE matter in data engineering interviews

Schema design is what separates juniors from seniors — every constraint is a future bug prevented

Worked example — design and populate an orders table

2. Anatomy of CREATE TABLE — columns, types, and the constraint catalogue

create table sql — the full signature, slice by slice

3. PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, NOT NULL

sql constraints — five rules the engine enforces on every write

SQL interview question — orders + customers schema with FK and cascading rules

Solution Using CREATE TABLE + named constraints

4. DEFAULT values, computed columns, and identity / SERIAL

default value sql and auto-generated columns — let the engine fill in what you don't write

5. ALTER TABLE — ADD, DROP, MODIFY, RENAME column patterns

alter table sql — evolve the schema without dropping the data

6. Indexes paired with table design — CREATE INDEX

create index sql — every access pattern deserves an index

7. Online migration gotchas — locks, type-change rewrites, FK chains

The seven bugs interviewers test most often on DDL changes

Choosing the right DDL pattern (cheat sheet)

Frequently asked questions

What's the difference between PRIMARY KEY and UNIQUE in SQL?

What's the difference between DROP TABLE, TRUNCATE TABLE, and DELETE FROM?

How do I add a column to an existing table without downtime?

When should I add a foreign key vs handle the relationship in application code?

What index types should I use for a typical OLTP table?

Practice on PipeCode

Worked example — design and populate an `orders` table

`create table sql` — the full signature, slice by slice

`sql constraints` — five rules the engine enforces on every write

`default value sql` and auto-generated columns — let the engine fill in what you don't write

`alter table sql` — evolve the schema without dropping the data

`create index sql` — every access pattern deserves an index