CodeWithDhanian

Posted on May 24

Day 4: Creating Tables – Data Types, NULL, and DEFAULT Constraints

#sql

The Foundation of Every Relational Database: Why Table Creation Matters

When you build any production application—whether it is a customer management system, an inventory tracker, or a full-scale e-commerce platform—the table becomes the core structural unit that holds your data. A table in a relational database is not just a spreadsheet; it is a precisely defined container that enforces rules about what data can live inside it, how that data is stored, and how the database engine will interact with it for years to come.

The CREATE TABLE statement is the very first place where you, as a developer or architect, make decisions that affect performance, data integrity, storage costs, query speed, and even the long-term maintainability of your entire system. Today we will explore this statement in exhaustive detail: from the underlying theory of data types to the subtle but critical behavior of NULL values and the practical power of DEFAULT constraints. We will examine real engineering trade-offs, internal database workflows, and the exact syntax you will use in both PostgreSQL and MySQL—the two databases you set up on Day 2.

The CREATE TABLE Statement: Syntax, Structure, and Execution Flow

At its core, the CREATE TABLE command instructs the database management system to allocate storage space, define column metadata, and register the table within the database’s system catalog. The engine parses your statement, validates data-type compatibility, applies any immediate constraints, and then creates the physical files (or pages) that will eventually hold rows.

The basic structure looks like this:

CREATE TABLE table_name (
    column_name1 data_type [constraints],
    column_name2 data_type [constraints],
    ...
);

Every column definition consists of three mandatory parts and one optional but extremely important part:

The column name (must be unique within the table)
The data type (determines storage format, range, and behavior)
Optional constraints such as NOT NULL or DEFAULT

When the database engine processes this statement, it performs several internal steps in sequence:

Acquires a schema lock to prevent concurrent modifications.
Validates that the chosen data types exist and are compatible with the storage engine.
Allocates space in the heap (the main table storage area).
Records the table definition in the system catalog tables (pg_class and pg_attribute in PostgreSQL, or information_schema.columns in both engines).
If the table is created successfully, it becomes immediately queryable.

Choosing the wrong data type or forgetting a constraint at this stage can force you to run expensive ALTER TABLE operations later—which lock the table and can cause downtime in high-traffic systems.

Data Types: The Language Your Database Speaks

Every value you store must be declared with an explicit data type. The database engine uses this declaration to decide how much disk space to reserve, how to encode the value in memory, how to compare values during queries, and how to index the column efficiently. Selecting the correct data type is both an art and a science; it directly impacts storage costs, query performance, and application correctness.

Numeric Data Types

Integer types are the workhorses of most applications. In PostgreSQL you have SMALLINT (2 bytes, ±32,767), INTEGER (4 bytes, ±2.1 billion), and BIGINT (8 bytes, ±9 quintillion). MySQL offers the nearly identical TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT.

For fractional numbers, use NUMERIC(precision, scale) (also called DECIMAL in MySQL). This type stores exact decimal values—critical for financial data—unlike REAL or DOUBLE PRECISION, which use floating-point approximation and can introduce rounding errors that accumulate over millions of transactions.

Real-world example: an order total column should never use DOUBLE PRECISION; always use NUMERIC(12,2) to guarantee penny-level accuracy across years of compounding calculations.

Character Data Types

Text storage comes in two families. VARCHAR(n) stores variable-length strings up to n characters and is the default choice in modern applications. TEXT (or LONGTEXT in MySQL) removes the length limit entirely and is ideal for product descriptions or user comments.

CHAR(n) is fixed-length and right-pads with spaces; it can be slightly faster for very short, consistently sized fields (e.g., country codes), but it wastes space otherwise. In high-scale systems, the performance difference between VARCHAR and TEXT is negligible because both are stored out-of-line when they grow large, but the choice still matters for index size and memory usage during sorting.

Date and Time Data Types

DATE, TIME, TIMESTAMP, and TIMESTAMPTZ (timestamp with time zone in PostgreSQL) are essential. PostgreSQL’s TIMESTAMPTZ automatically normalizes everything to UTC internally while preserving the original time zone for display—an architectural decision that prevents daylight-saving bugs that plague many legacy MySQL applications using plain DATETIME.

Always store timestamps with time zone awareness unless your entire system operates in a single fixed timezone. The internal workflow is simple: the engine converts the input to UTC on write and converts back on read based on the client’s session timezone.

Boolean and Specialized Types

The BOOLEAN type (or BOOL) stores true, false, or NULL. It occupies only one byte yet saves you from the classic “0/1 magic number” anti-pattern that leads to bugs when someone later interprets 2 as “maybe.”

PostgreSQL also offers JSONB for semi-structured data and UUID for globally unique identifiers—both of which we will use in production examples later in the series.

NULL: The Concept of “Value Unknown”

NULL is not a value; it is the explicit absence of a value. The SQL standard treats NULL through three-valued logic (true, false, unknown). This means that any comparison involving NULL—even NULL = NULL—evaluates to unknown, not true.

From an architectural standpoint, the database engine stores a special null bitmap for each row. When a column is NULL, the engine skips storing any data for that column, saving space. However, this bitmap adds a tiny overhead to every row, which becomes measurable only at billions of rows.

In real systems, NULL represents legitimate business states: “user has not yet set a middle name,” “order has not yet shipped,” or “product has no discount expiration date.” Misusing NULL for “zero” or “empty string” is one of the most common sources of incorrect analytics.

Enforcing Presence with NOT NULL Constraints

The NOT NULL constraint tells the database engine to reject any INSERT or UPDATE that would leave the column without a value. This is enforced at write time, before the row reaches disk.

Placing NOT NULL on columns that your application logic depends on (email, created_at, status) prevents entire classes of runtime errors. It also allows the query planner to generate more efficient execution plans because the engine knows the column will never be unknown.

Supplying Sensible Defaults with the DEFAULT Constraint

The DEFAULT constraint provides an automatic value when none is supplied during an INSERT. The engine evaluates the default expression at write time and substitutes it seamlessly.

Common realistic defaults include:

CURRENT_TIMESTAMP or NOW() for audit columns
A literal string such as 'active' for status columns
A numeric literal such as 0.00 for price fields
A complex expression such as gen_random_uuid() in PostgreSQL

MySQL and PostgreSQL both support DEFAULT on almost every data type, but PostgreSQL allows far richer expressions (including subqueries and functions) because its parser is more expressive.

Building a Production-Grade Table: A Complete Realistic Example

Let us create a users table that reflects real engineering decisions made every day in SaaS companies.

-- PostgreSQL version
CREATE TABLE users (
    id              BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    email           VARCHAR(255) NOT NULL,
    full_name       VARCHAR(255),
    username        VARCHAR(100) UNIQUE,
    status          VARCHAR(20)  DEFAULT 'pending' NOT NULL,
    created_at      TIMESTAMPTZ  DEFAULT NOW() NOT NULL,
    last_login_at   TIMESTAMPTZ,
    preferences     JSONB        DEFAULT '{}'::JSONB,
    is_verified     BOOLEAN      DEFAULT FALSE NOT NULL
);

-- Equivalent MySQL version (note AUTO_INCREMENT and slight syntax differences)
CREATE TABLE users (
    id              BIGINT AUTO_INCREMENT PRIMARY KEY,
    email           VARCHAR(255) NOT NULL,
    full_name       VARCHAR(255),
    username        VARCHAR(100) UNIQUE,
    status          VARCHAR(20)  DEFAULT 'pending' NOT NULL,
    created_at      DATETIME(3)  DEFAULT CURRENT_TIMESTAMP(3) NOT NULL,
    last_login_at   DATETIME(3),
    preferences     JSON         DEFAULT (JSON_OBJECT()) NOT NULL,
    is_verified     TINYINT(1)   DEFAULT 0 NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Walk through the logic:

id uses an auto-incrementing or identity column so the engine guarantees uniqueness without application-level coordination.
email is NOT NULL because every user must have one; the database engine will reject any attempt to create a row without it.
status has a DEFAULT so new users start in a safe pending state even if the application forgets to specify it.
created_at defaults to the exact moment of insertion, providing a reliable audit trail without requiring application code to remember to set it.
preferences as JSONB (PostgreSQL) or JSON (MySQL) allows flexible storage of user settings without schema changes.

When you execute this statement, the database engine creates the table, builds the internal row header with the null bitmap, reserves space for the default values, and registers the column metadata so future queries can leverage type information for optimization.

Engineering Trade-offs, Performance Implications, and Best Practices

Choosing data types and constraints is never free.

Wider numeric types (BIGINT vs INTEGER) double storage and index size but prevent overflow bugs that crash production systems.
Variable-length types (VARCHAR, TEXT) are more space-efficient but require an extra length prefix and can fragment pages over time.
NOT NULL columns allow the query planner to skip null-check branches, improving index usage and reducing CPU cycles.
Over-use of DEFAULT expressions that call functions (e.g., NOW()) adds microseconds per row; in a table receiving 10,000 inserts per second, this becomes measurable.

Best practice: declare the narrowest, most restrictive data type and the strictest constraints that still match real business rules. Run EXPLAIN ANALYZE on representative queries after creation to validate that the engine is using the types and constraints you intended. In high-scale environments, these early decisions determine whether your database stays on a single node or needs sharding later.

For teams working in both PostgreSQL and MySQL, maintain separate migration scripts or use an abstraction layer that understands the subtle syntax differences—especially around default timestamps and identity columns.

If you want to take these concepts from theory to production-grade mastery with dozens of additional exercises, real schema evolution patterns, and battle-tested migration strategies, grab the comprehensive SQL Playbook at https://codewithdhanian.gumroad.com/l/hjmix —it was built exactly for engineers who want to move beyond tutorials and ship reliable systems.